Debian must ship reproducible packages(lists.debian.org) |
Debian must ship reproducible packages(lists.debian.org) |
Given how many quick & dirty sed patching or exec commands I've seen in the few nix package/modules I've read, I would not exactly bet my life on it being completely idempotent & reproducible.
It's not reproducible bit by bit, it fetch the current version of anything, but it's still easy to reproduce enough, stable enough and complete enough, while classic distros need a fresh install every major release or facing issues an keeping a system in unknown state for long until it explode.
They're still a pragmatic choice for many usecases.
It took a while though until this was understood. In 2007 when pointing out on debian-devel that this is needed, I was still told what huge waste of time this would be. And indeed it took a huge amount of work by many people to get there, but it is well worth it.
"Well worth it" is not correct. And it just ups the the contribution barrier to Debian higher, I already heard a lot of people complaining that contributing to Debian is hard and while in past I defended it by "they need all the checks and bounds to make sure packages play with eachother nicely", this is just step that makes it hard for no reason and little benefit.
https://reproducible-builds.org/
Could you perhaps respond to the argumentation here?
Anyone having to maintain a code base or a distributed fleet of devices will gain from this decision, immensely, as their operational periods come and go.
Reproducible builds are about longevity as much as they are about security.
Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
Have many organizations produce the binaries independently and post the arifacts.
Once n of m parties agree on the arifact hash, take that as the trusted build.
If every party reaches a different hash then we cannot build consensus.
(It was caught before being promoted into a stable Debian release, yes, but this sort of relied on a happy accident, too close for comfort)
Those people do not care about quality in opensource at all. For longliving software this is very important.
Of course, all those javascript and kubernetes packages which are irrelevant in a few years again, might complain, but let them complain.
I'm reading this as a suggestion that the reproducible builds effort was an ineffective deterrent.
However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.
> And it just ups the the contribution barrier to Debian higher
Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc. These are best practices anyway.
(Orange = FTBR = "failed to build reproducibly")
I'm not good at reading numbers from charts, but I'd guess it's a few percent (4-5ish?).
> Forbidden
> <p>You are not allowed to access this!</p>
(yes, with HTML tags on display) :)
EDIT: I also found a "I Challenge Thee" page in history. did I just get blocked by antibot measures? why???
They're a guarantee that if there's a backdoor, it's reproducible 100% of the time.
This is a godsend for white hats fighting the good fight.
And, as a side note, it's strongarming vs the bad guys: "Would be too bad if we could reproduce your shiny exploit 100% of the time wouldn't it!?".
Note that we should go further (but it's a bit orthogonal to reproducible builds): builds of the final binary/package should happen by first entirely discarding all files not necessary for the final build (like all test cases and all test assets). The build should literally happen in an environment that gets rid of those (after, of course, having test in another environment that all tests cases succeed): if I'm not mistaken get rid of test assets would have stopped Jia Tan's XZ backdoor attempt dead in its track (for example). Because IIRC there were binary data part of the backdoor hidden in some asset only used by test cases.
P.S: as a bonus they also allow to detect bit-flips (I'm not saying there aren't other ways to detect bit-flips: what I'm saying is that if you have deterministic builds anyway and something doesn't reproduce correctly due to a flipped-bit, it's going to be noticed).
I think Magnus Ihse Bursie said it best while working on reproducible builds of OpenJDK: "If you were to ask me, the fact that compilers and build tools ever started to produce non-deterministic output has been a bug from day one." [2]
[1] https://www.linux.com/news/preventing-supply-chain-attacks-l...
[2] https://github.com/openjdk/jdk/pull/9152#issue-1270543997
Reproducible builds are an essential method in industrial computing - Debian isn’t at the forefront of this, it is merely adopting industry wide techniques also applied to other operating systems in use in long-term and safety-related applications.
Certainly, a lot of the hard work of the Yocto and Debian developers is already in your hands.
What is interesting is that this is now being applied in a more forward-focused policy by the Debian developers, that it will now be the norm rather than an option…
reproduced: 97.02% good: 17586 bad: 511 fail: 30 unknown: 0
This, statistics for other architectures, and the reasons for unreproducibility can be found at https://reproduce.debian.net.
Most with failed to reproduce: NT_GNU_BUILD_ID. The others on some other bits. Mostly timestamps or hashes I assume
It feels like AI and traditional software are converging in complexity.
You don't have permission to access this resource. Apache Server at lists.debian.org Port 443
:/
It does work with my privacy/scrapping setup (residential proxy, spoofed fingerprints, Qubes and so on), great job debian.
The build timestamps in the PE header and export table are also a problem as well.
giant leap for mankind.
What is a win is that two independent parties can run the same build, and get the same binaries.
This is important because it removes trust from builders: anyone can verify their output.
It just so happens that unimportant things like build versions impede that.
BTW, most Debian packages have reproducible builds. Those which have not (I'd say 5%) are shown in orange in the graph there: https://wiki.debian.org/ReproducibleBuilds
It's not like the Linux world where you have distinct projects like the Kernel, GNU, OpenSSL, and then it's the distributions job to assemble everything.
In the BSD projects, the scope is developing and distributing an entire base system, i.e., the kernel but also the libc, the shell/all posix utilities, and a few third parties like OpenSSH (which are usually "softforked").
It's quite visible in the sources, it's a lot more than just a kernel: https://github.com/NetBSD/src
Additional packages you could get from pkg_in/pkgsrc (NetBSD), pkg-ng/ports (FreeBSD) or pkg_add (OpenBSD) are clearly distinct from the base system, installed in a dedicated subtree (/usr/src in NetBSD, /usr/local/ OpenBSD/FreeBSD), and provided in a best effort manner.
The reproducible build target was almost certainly only for the base system, which is a few percent of what Debian tries to achieve, and on which NetBSD has a tighter control over (developer + distributor instead of downstream assembler+distributor).
A reproducible base system is useful, but given how quickly you typically need to install packages from pkgsrc, it's not quite enough.
Maybe that's trying harder on design rather than trying to remedy the consequences later.
Debian has come along way, but when Debian says reproducible they mean they grab third party binaries to build theirs. When we say reproducible we mean 100% bootstrapped from source code all the way through the entire software supply chain.
We think that distinction matters.
also, stagex and others probably profited QUITE A LOT from the debian efforts, because they started to go upstream and talking to developers..
just arch linux profited from debian maintainers a decade before that an debian people asking upstream to improve...
Unfortunately, the term “reproducible” can be interpreted in many ways because there is no strict and complete definition. People and projects bend it to their liking.
Your approach is correct.
Maybe not by itself, but it does allow for the ecosystem to be audited, in a way that ultimately benefits the end-user. It really is an important part of a healthy supply chain.
This is nice pat yourself on the back achievement for people that prefer security theatre and checking boxes than doing something actually useful, and they wasted thousands man hours of poor victims that had to implement it
The thing reproducible builds aim to prevent is Debian or individual developers and system administrators with access rights to binary uploads and signing keys to get forced to sign and upload binary packages by attackers - be these governments (with or without court orders) or criminal organizations.
As of now, say if I were an administrator of Debian's CI infrastructure, technically there would be nothing preventing me from running an "extra" job on the CI infrastructure building a package for openssh with a knock-knock backdoor, properly signing it and uploading it to the repository. For someone to spot the attack and differentiate it, they'd have to notice that there is a package in the repository that has no corresponding build logs or has issues otherwise.
But with reproducible builds, anyone can set up infrastructure to rebuild Debian packages from source automatically and if there is a mismatch with what is on Debian's repository, raise alarm bells.
Indeed, this could mitigate an attacker replacing the binary with something that's not produced from the code, but it does not mitigate the tool chain or code itself containing the exploit, creating a malicious binary.
Curious, what distros where affected by npm supply chain attacks?
Not being able to see if the source code shipped is the same as been used for creating the binary is scary
Reproducable builds are not solving all issues as you rightly observed, but they can be a stepping stone (or even a pre-condition) for further measures.
The end-user experience is that now you can host your Debian binaries in caches and CDNs without worrying about supply chain hackers.
You can verify that file hashes match the ones on Debian's website and sleep much better at night.
If you don't trust Debian's website then you can rebuild yourself and check if Debian has been compromised.
Defense in depth obviously is a good thing
Guix is not fully bootstrapped or reproducible.
To your point though, the incomplete efforts of many other distros absolutely accelerated us.
Yay for Free software and Opensource! we all benefit! :)
Reproducible builds protect against all attacks.
Generic statements like this are always false. As a simple rebuttal, reproducible builds do not protect against source-level attacks such as intentional backdoors or disabled/obfuscated access checks. In fact, I'd say that reproducible builds protect against one class of attacks only: compromise of the build infrastructure.
Here is jujutsu’s list of dependencies[0] and their hashes[1]. As an aside, that’s why I don’t like those packages managers. Something like Python’s numpy or lib curl, get sliced into atomic portions.
[0]: https://github.com/openbsd/ports/blob/master/devel/jujutsu/c...
[1]: https://github.com/openbsd/ports/blob/master/devel/jujutsu/d...
Obviously, it would be a ton of work to make such a system resistant to gaming by malicious actors (see GNU Guix for useful efforts), but it would provide valuable diversity in architecture and (political or other) control.
It would be even cooler if we could have independent projects that could run on various distros and OS, and build packages for any of them. Having packages for bsd verified on linux and vice-versa with statistical logging (this code has been verified x times on y OSes) would be reassuring.
This has been the status quo in Debian for a while now. You can build, and use diffoscope to audit the differences.
It's a stronger security property to have bit-for-bit reproducibilty, and it looks like Debian are ready to commit to it.
I had figured the cost would decrease in time as deterministic builds became the norm (i.e. build tools stop including build timestamps).
I agree that it might not have positive POI. Bit tricky for me to judge.
Still, lots of good non-security benefits to reproducible builds too.
The backdoor relied first on a difference between building a package in a packaging environment versus building the package on your own. And also, it relied on the very common practice of checking in unreviewable artifacts into the source tree (e.g., the configure script, malicious binary test artifacts).
Reproducible builds guarantee that two people can follow the same instructions and get the same, bit-identical outcome. It does nothing to guarantee that those instructions have not been compromised, and all of the great packaging security failures of my lifetime that I can think of have relied on those instructions being compromised (e.g., xz utils, Debian OpenSSL keygen issues).
At the time of xz attack, the package was already reproducible.
Similarly for package managers, the biggest issues are typo-squatting or maintainer credentials compromise. And in neither case does the attacker have any incentive to take advantage of it in a way that breaks reproducibility--they can be completely honest about what they're doing. Now even if I were an attacker who had compromised a maintainer's machine, I'd still probably go for compromising the source rather than compromising the final artifact-generation process... simply because compromising the source makes the exploit live longer.
As xz shows, once you have a compromised maintainer, there's basically nothing you can do to fix it except by having someone else discover the compromise and locking out the maintainer.
You can content addressing without reproducible builds. You just have a canonical version, which typically is built by the application developer.
I also asked Gemini whether we express ourselves that way in my mother tongue (Mandarin), and yes, we do, but it came off as being too formal way of speaking. We don't normally use it (I'm not from China/Taiwan though)
Why not? Debian has a fantastic track record of providing old versions, for instance here's the build tools from Debian 2.0 from 1998:
https://mirrors.accretive-networks.net/debian-archive/debian...
Just baking in build ID and commit is enough. What you think reproducible builds add ?
> Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
So far not a single person in the thread gave me concrete example (as in existing project, existing problem, no other solution can solve it). Just claiming it's better based on their feelings. Come on, be the first one.
Think industrial applications, such as rail and heavy industry transportation. We use reproducible builds here as part of a wider safety-critical protocol which guarantees that what we are running is what we expect to run - nothing more, nothing less.
Reproducible builds are certifiable. They can be relied on in environments where certification costs millions and takes years.
Think outside your consumer box for a minute.
Like what exactly?
If anything it will make attacker's job easier, as Ubuntu package will have same files structured exactly same way as Debian one.
As opposed to what? If Ubuntu uses the same source, of course they get the same binaries. And if Ubuntu applies patches, they'll get something different. And that's still true.
There was no attack that reproducible builds would help protect from before 2007 either.
> Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
> Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc.
that's the entirety of the problem. App developers don't want to be package experts or build experts.
> These are best practices anyway.
They are not. They are best practices if you want reproducible builds. They are entirely useless waste of time if you don't care.
App developers and Debian package maintainers are already separate groups.
Or if you're writing a test suite, and you want failing test results to be actionable.
Or you have any other type of behavior that you'd like to reproduce somehow.
One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
The class of bugs would be extremely small as the stuff that makes build hard to reproduce are 99% of the time stuff irrelevant to runtime like some build time embedded in binary, some file metadata having different timestamp, or maybe linker putting stuff in a bit different order.
> One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
I think you will find amount of people that had problems reproducing because of non-100% exact build is vanishingly small, possibly non-existent.
And that is because if you get package version and want to reproduce it, you get the package, install it and try to reproduce it. The package WILL be 100% the same as the one you got in bug report because you both downloaded the same artifact from same mirror network. You don't need reproducibility to get same binary to reproduce bug
There are plenty of places in industrial computing where reproducible builds have prevented subterfuge within the organizations themselves. Injecting binaries to do inf-/exfiltration is a long-standing industrial espionage activity which is of immense value to all users of the operating system - not just the consumer users.
There's a certain irony in pushing for verifiable builds with completely unverifiable claims.
I've worked at several of the biggest targets for espionage, industrial or otherwise, and to the best of my knowledge, the only thing that's ever been discovered by their reproducible build efforts has been failing hardware on build reproducers
Less diligent projects had it but there are easier ways to fix it
The one single fail point they prevent is infected build hosts.
That might be some reasonable benefit for the company if it is building it on public architecture, but for projects like Debian that insist build hosts are basically offline (package in, package out with no internet access during build process) it is very fringe benefit.
Another thing to consider is that Debian has quite a few derivatives who may also rebuild packages from source, so you have a multiplier there.
Forcing devs to use hardware keys to sign commits/CI requests would be actual security improvement, thwarting many supply chain attacks that only worked coz the attacker got to developer credentials. Hardware keys at least have option to make some operations require physically pressing the key so there is chance developer will notice.
But thanks to reproducible builds, at least someone can... validate that the binary code of vulnerable package can be reproduced. Very fucking useful.
I am not saying it is useless. I am saying it is one of highest hanging fruits on security tree.
With reproducible builds, you do not need to trust that the system that build the binary was not compromised, because this would be detected immediately.
Source compromises are still an issue, but there is a much bigger change that they are detected. Also if there is a compromise, reproducible builds allow you to later track it to the source. For an infected binary it is much more difficult to understand how it got there and what else might be compromised.
> Also if there is a compromise, reproducible builds allow you to later track it to the source.
They do not. Git log and build logs allow for that.
Reproductive builds only have value after the source. They protect build servers from being compromised (and then only if some other uncompromised environment is also running verification passes), if the bug is at source reproductive builds are exactly as valuable as writing commit that was used for build in app's code/package metadata.
Not being reproducible is a bug.
There is no reason for a build to not be reproducible, but somehow we let the built binaries become infested with timestamps, login names and file system paths. We recently moved to reproducible builds at work, and discovered that our login names and local home directory paths were being shipped in every release. No one was was very happy about leaking PPI like that.
You may not consider it worth the effort, but you aren't the one putting in the effort so I'm not sure why that matters to you. It is very much worth the effort to those people doing the work. Debian is a do-orcracy and so the people doing the work get to make the decisions.
Say you're making a chat app - you wouldn't incorporate a delete feature, and you might be tempted to use some kind of blockchain to prove all messages were delivered without gaps. But if you ignore archival needs you design something similar to IRC which is much simpler.
Reproducibility adds nothing here
Toolchains absolutely need to be maintained with some degree of longevity.
The whole world doesn’t march to your consumer-user beat. Sometimes it functions at industrial-user tempo’s, too ..
Not that you are wrong, industry keeps thinking they can make themselves immune and so long term reproducibility is useful, but I submit they are wrong.
Literally none of your statements are applicable to that realm, sorry.
Rail operators have long since been operating their air-gapped infrastructure with 99.999% safety results, literally not adhering to any of the policies you claim are endemic to the industry.
git log and build logs do not help you at all, if you can not even determine that the compromised binary has any relation to the build log or the source you may want to look it. This is what reproducible builds give you. You are right that it does not protect against compromised sources.
Here's the quote from the guy that discovered it in the initial public disclosure [1]:
After observing a few odd symptoms around liblzma (part of the xz package) on Debian sid installations over the last weeks (logins with ssh taking a lot of CPU, valgrind errors) I figured out the answer. The upstream xz repository and the xz tarballs have been backdoored. At first I thought this was a compromise of debian's package, but it turns out to be upstream. One portion of the backdoor is *solely in the distributed tarballs* and debian's import of the tarball ... it is also present in the tarballs for 5.6.0 and 5.6.1.
[1]: https://www.openwall.com/lists/oss-security/2024/03/29/4Good. Then we are on the same page.