NixOS Reproducible Builds: minimal ISO successfully independently rebuilt

NixOS Reproducible Builds: minimal ISO successfully independently rebuilt(discourse.nixos.org)

548 points by CathalMullan 2 years ago | 173 comments

Rebuilding the minimal ISO from source is an impressive milestone on the journey to a system that builds from source reproducibly. Guix had an orthogonal but equally impressive milestone on the same journey recently[0], bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs. These two features may one day soon be combined to reproducibly build a full distribution from source.

[0] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

TacticalCoder 2 years ago | |

That is amazing and it is great to see there are people out there fighting the good fight (while others ask: "but where's the benefit!? if there's a backdoor, everybody is still going to get the backdoor!").

> it gives us a reliable way to verify the binaries we ship are faithful to their sources

That's the thing many don't understand: it's not about proving that the result is 100% trustable. It's about proving it's 100% faithful to the source. Which means that should monkey business be detected (like a sneaky backdoor), it can be recreated deterministically 100% of the time.

In other words for the bad guys: nowhere to run, nowhere to hide.

TheDong 2 years ago | | |

To me, the largest benefit isn't even related to "bad guys", but rather in being able to understand and debug issues.

Reproducibility makes bugs more shallow. If hydra builds a bit-for-bit identical iso to what you build locally, that means a developer can make a change to the iso inputs, test it, and know that testing will also apply to the final ci-built one.

If a user reports a bug in the iso, and you want to test if a change fixes it locally, you can start from an identical source-code commit as the iso was built from, make some minimal changes, and debug from there, all without worrying that you're accidentally introducing unintended differences.

It minimizes "but it works on my machine" type issues.

tomcam 2 years ago | | |

Super clarifying, thank you.

tracnar 2 years ago | |

It's not yet as far as the Guix stage0, but there was an interesting talk about bootstrapping nix from TinyCC at NixCon: https://media.ccc.de/v/nixcon-2023-34402-bootstrapping-nix-a...

Thrir94994i 2 years ago | |

357 bytes for bootstrap compiler binary is VERY impressive!

msm_ 2 years ago | | |

If I remember correctly, this tiny binary is used to (reproducibly) bootstrap the next binary, which bootstraps the next binary, until eventually GCC can be compiled (and compile other software).

rssoconnor 2 years ago | | |

To be fair, it is 357 bytes ... plus a POSIX operating system.

Still, that POSIX operating system bit is also being worked on.

15155 2 years ago | |

At 357 bytes, do you need a reproducible binary at all?

I'd think one could hand-document all 357 bytes of machine code and have them be intelligible.

jowea 2 years ago | | |

This[0] is basically the hand-documentation of those bytes then. Handwritten ELF header and assembly code.

[0] https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX...

ahoka 2 years ago | | |

That’s just the first stage. Simple enough to be audited manually.

forkerenok 2 years ago | | |

Or tattooed on oneself! Or etched on a dog tag!

tomcam 2 years ago | |

> bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs.

wtf that is mind-boggling. Thanks for the link.

hardwaresofton 2 years ago | |

Classic HN, the top comment in a Nix post is about Guix.

Nix has more packages and advocacy (even if the vast majority of people exposed to nix/guix will never actually use it), but Guix is a lot more interesting to me with the expressive power of scheme on offer.

That said, there are some sharp edges[0] that seem a bit harder to figure out (is this just as inscrutable/difficult as nix?).

Does anyone have some good links with people hacking/working with guix? Maybe some blogs to follow?

I care more about the server use-case and I'm a bit worried about the choice of shepherd over something more widely used like systemd and some of the other libre choices which make Guix what it is. Guix is fine doing what it does, but it seems rather hard to run a Guix-managed system with just a few non-libre parts, which is a non-starter.

Also, as mentioned elsewhere in this thread, the lack-of-package-signing-releases is kind of a problem for me. Being source and binary compatible is awesome, but I just don't have time to follow the source of every single dependency... At some point you have to trust people (and honestly organizations) -- the infrastructure for that is signatures.

Would love to hear of people using Guix in a production environment, but until then it seems like stuff like Fedora CoreOS/Flatcar Linux + going for reproducible containers & binaries is what makes the most sense for the average devops practitioner.

CoreOS/Flatcar are already "cutting edge" as far as I'm concerned for most ops teams (and arguably YAGNI), and it seems like Nix/Guix are just even farther afield.

[EDIT] Nevermind, Guix has a fix for the signature problem, called authorizations![1]

[0]: https://unix.stackexchange.com/questions/698811/in-guix-how-...

[1]: https://guix.gnu.org/manual/devel/en/html_node/Specifying-Ch...

rekado 2 years ago | | |

> Does anyone have some good links with people hacking/working with guix?

We've just had a conference about Guix in HPC: https://youtu.be/dT5S72x18R8

This is a recording of a stream for the second day with talks about large scale deployments of Guix System in HPC.

dataflow 2 years ago | |

How long does a fully bootstrapped build take?

mbakke 2 years ago | | |

It obviously depends on the hardware, but IIRC for me maybe 3-4 hours building from the 357 byte seed to the latest GCC.

The early binaries are not very optimized :-)

pharmakom 2 years ago | | |

With caching, just the time to download the artefact.

ahmedfromtunis 2 years ago |

Stupid question as I never worked on something like this before: why isn't reproducibility the default behavior?

I mean if 2 copies of a piece of software were compiled from the same source, what stops them from being identical each and every time?

I know there are so many moving parts, but I still can't understand how discrepancies can manifest themselves.

mihalycsaba 2 years ago |

Sorry for being dense, but I thought one of the main reason for nixos's existence is reproducibilty. I thought they have these kinds of things solved already.

I have only ~2 hours experience with Nixos, wanted to try hyprland, I thought it would be easier on Nixos since hyprland needs a bit of setup and maybe it's easier to use someone else's config on nixos, than on some other distro. Finding a config was hard too, found like 3 on some random github gists, thought there would be more... and none of them worked, at that point I gave up.

Reventlov 2 years ago |

For those wondering : it should be remembered that the reproducibility of Nix / NixOS / Nixpkgs is only a reproducibility of the sources: if the sources change, one is warned, but it is not a question of the reproducibility of the binaries (which can change at each build). This binary reproducibility of Nix / NixOS / Nixpkgs is indeed not really tested, at least not systematically.

Guix, Archlinux, Debian do the binary reproducibility better than Nix / NixOS / Nixpkgs.

Sources :

- https://r13y.com/ ( Nix* )

- https://tests.reproducible-builds.org/debian/reproducible.ht... ( Debian )

- https://tests.reproducible-builds.org/archlinux/archlinux.ht... ( Archlinux )

- https://data.guix.gnu.org/repository/1/branch/master/latest-... (Guix, might be a bit slow to load, here is some cached copy https://archive.is/lTuPk )

somat 2 years ago |

I find it funny(ironic) that the OpenBSD project is trying hard to go the other way, every single install has unique and randomized address offsets.

While I understand that these two goals, reproducible builds and unique installs, are orthogonal to each other, both can be had at the same time, the duality of the situation still makes me laugh.

oever 2 years ago | |

If the address offsets can be randomized with a provided seed, then demonstrating reproducibility is still possible.

Alternatively, randomizing the offsets when starting the program is another way to keep reproducibility and even increase security; the offsets would change at every run.

WhyNotHugo 2 years ago | |

OpenBSD does randomised linking at boot time. Packages themselves can still be reproducible. All the randomisation is done locally after the packages are downloaded and their checksums validated.

lrvick 2 years ago |

Now if only they would have maintainers sign packages like almost every other linux distribution has done since the 90s, so we have any idea if the code everyone is building is the same code submitted and reviewed by known individuals.

Until signing is standardized, it is hard to imagine using nix in any production use case that protects anything of value.

mbakke 2 years ago |

Very impressive milestone, congrats to those who made this possible!

> [...] actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created.

Can anyone shed some light on the fix for "how the ISO was created"? I attempted making a reproducible ISO a while back but could not make the file system create extents in a deterministic fashion.

raboof 2 years ago | |

For NixOS, it's in the 'how did we reproduce' section of the article: the last step of that process produces the iso in the ./result/iso directory.

It sounds like what you're looking for is the commands that that build invoked, but I'm not sure what step you're looking for. For example, the xorriso invocations are at https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/make-...

mgaunard 2 years ago |

Don't you have to fake the system time to do this? The time often ends up inside the binaries one way or another.

mbakke 2 years ago | |

Indeed time stamps are probably the most common sources of indeterminism. So common that a de-facto standard variable to fake a timestamp has been implemented in many compilers:

https://reproducible-builds.org/docs/source-date-epoch/

amelius 2 years ago | |

Could you name an example of how (and for what reason) this might happen?

mbakke 2 years ago | | |

Typically part of a "version string":

    $ python3
    Python 3.10.7 (main, Jan  1 1970, 00:00:01) [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>

Perhaps a relic from when software had to be manually updated?

mgaunard 2 years ago | | |

GCC embeds timestamps in o/gcno/gcda files to check they match.

It's mostly annoying as gcov will actively prevent you from using gcda files from a different but equivalent binary than what generated the gcno.

preisschild 2 years ago | |

You would just either not include the timestamp at all in all builds, or set 0, so the build date is 1970 everywhere.

Uptrenda 2 years ago |

Wouldn't this help solve the problem Ken Thompson wrote about in 'reflections on trusting trust?' If you can fully bootstrap a system from source code then it's harder to have things like back-doored compilers.

raboof 2 years ago | |

It indeed helps, but it is not a full 'solution': you could still in theory envision elaborate backdoors in the 'environment' in which the ISO is built. If you really want to 'solve' the problem describe there, you could look into Diverse Double Compiling (https://dwheeler.com/trusting-trust/) or bootstrapping the entire environment (https://bootstrappable.org/) - see also the 'Aren’t there bootstrap problems with the above approach?' section of the post.

Reproducing the build already goes a long way in making such attacks increasingly unlikely, though.

KennyFromIT 2 years ago |

I've lived in the Red Hat ecosystem for work recently. How does this compare to something like... Fedora Silverblue? Ansible? Fedora Silverblue + Ansible?

TheDong 2 years ago | |

The closest equivalent to the nixos ISO builder and reproducibility related to it in the fedora ecosystem is osbuild / imagebuilder - https://www.osbuild.org/guides/introduction.html

Imagebuilder claims reproducibility, but as far as I know it mostly installed rpm packages as binaries, not from source, so it's not really proper reproducibility unless all the input packages are also reproducible.

If the descriptions of building packages from source, building distro images, and reproducibility in the linked thread didn't make sense to you, you're probably not really the target audience anyway.

Crontab 2 years ago |

I love that there are people out there who cares about things like this.