We'll see if that elicits any real objections. Hopefully not, and this will be part of 5.18!
Very nicely put, and thank you for putting that together!
This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated.
With very minimal downside, we're finally in a position where we can make this change.
Do I understand correctly that with this patch, the only drawback is for weird architectures that do not have an instruction counter or another instruction to gather random data? And in those cases, the old behavior (“urandom may be insecure shortly after boot”) may even be worse than the new behavior (“may block while waiting for entropy”) ?
Most modern userspaces already use getrandom(flags=0). Nothing changes for them. They already count on the rng being seeded in one way or another.
Rather, this changes /dev/urandom, which previously would give insecure randomness before being seeded. With this change, this doesn't happen any more, because it makes /dev/urandom wait until it has been seeded.
In practice, the RNG get seeded by a large variety of things. As a last ditch effort, the Linus Jitter Dance will seed it.
Taken together, what all the above amounts to is that the regression potential is limited to systems where: (A) /dev/urandom is still being used, rather than getrandom(flags=0), (B) the boot sequence, due to some bug, hard-depends on unseeded reads from /dev/urandom, (C) no ordinary sources of entropy, such as interrupts and input devices and disk drives, are available, (D) the CPU is so ancient as to be missing a cycle counter, defeating the last ditch Linus Jitter Dance, and (E) a new kernel will be installed on this old system.
I argue that the set of machines where (A), (B), (C), (D), and (E) all hold is minuscule.
If your approach is adopted, people would simply treat /dev/random and /dev/urandom as the same thing (which I gather is your intended goal). That is fine as long as CSPRNGs are relatively easy to make. I hear that this hinges on fancy theorems like P=BPP being true, but apparently they're not proven yet.
What if... in some parallel universe it turns out that P!=BPP, and the concept of CSPRNGs is fundamentally broken, and somebody discovers a practical method to break whatever PRNG is implemented in a system? In this admittedly unlikely universe, keeping the distinction between /dev/random and /dev/urandom (i.e. the former could block indefinitely, the latter could be insecure) seems to be the safer approach. Of course in this universe, Linux would still have to pull the PRNG from /dev/random and revert back to the old behavior, but at least it's fixable. But if userland drops the distinction between /dev/random and /dev/urandom, then the problem would be fundamentally unfixable until every app reviews and decides which guarantee they want for themselves (and releases patches).
Of course your patch does not really imply the contract between the kernel and userland has changed, which is why I mentioned intention. If it is intended to change the contract, maybe it's better to wait for P=BPP before you do it? :P
It used to be that /dev/random did some accounting to try to estimate how much entropy was in its pool, and if that number declined too low (as reading from /dev/random was figured to decrease the entropy), it would simply refuse to run its CSPRNG to produce any more output until it get fed some more entropy. This accounting was heavily criticized for being magic thinking and unsupported by any actual research, and revisions to the randomness engine in Linux over the past decade have eventually eliminated this entropy accounting in favor of just tracking how much has ever been added--if there's not enough, then it blocks until there is.
Systems where this would make urandom block for an objectionably long time (because CPU execution time jitter is unavailable or is believed to have low entropy) are largely hypothetical.
I think you can still have specific reservations about CPU execution time jitter, though my experience and reading makes me believe this is probably a pretty good source of entropy; personally, I feel the ball is firmly in the court of jitter skeptics to show why the entropy measures from actual running systems are wildly high estimates. [https://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-A...]
I also think you can still have specific reservations about how the kernel 'shepherds' its pool of random bits. I honestly am out of touch with the latest algorithms, both in research and in the Linux kernel. It would seem best if the kernel used a cryptographic algorithm where inferring the hidden random pool's state from outputs implies a useful attack on the cryptographic algorithm itself (i.e., has a proof of security). I don't think that Linux does this at the moment, based on recent discussion at https://lwn.net/ml/linux-kernel/20220201161342.154666-1-Jaso...
But let's take a moment to happily reflect: for applications, running well after system boot-up has completed, it is now soooo easy to have your fill of cryptographic-quality random numbers than it was in the bad old days.
- For virtualized systems there could be a hypervisor call to request an initial random seed.
- Run of the mill desktops/servers could use a reserved region (partition or boot record field) for preserving RNG state.
- On embedded devices a boot loader like uBoot could load state from some piece of NVRAM/NAND/…
There are hardly any platforms out there that are stateless in the literal sense. All of those approaches above would allow Linux to have a properly seeded RNG from the very first instruction that runs. No need to fully rely dubious things like timing execution/interrupts.
But of course snapshots, cloning, etc. can foil that badly, causing the same seed to be used multiple times. And on initial install you're not going to have any of that (but initial install is also when you may need to generate long-lived random numbers like ssh host keys).
Embedded devices it can be a real challenge. You must not re-use the seed data, so you effectively have to erase it from NVRAM/flash before use. But then if you lose power before you can generate a new one, you won't have one next boot. And you're adding flash writes, which decreases longevity and increases the chance of power failure in the middle of a write.
Qemu/KVM has a virtual RNG so you can feed host randomness into the guests if you want. So there are hypervisor calls available.
You could have a one-time written random seed though, which would be enough when combined with an RTC, and would probably still be enough when combined with the almost universally present HWRNGs built into everything now.
RTCs themselves nearly always seem to have about 56 bytes of memory, but I'd hate to see half of that used just for RTC, I'd rather it be exposed to applications via the FS somehow.
But, it could be enough for a 64 bit counter, which would be combined with a one time written secret.
Add.: the seed file is unique per installation, and is also updated continously by the system.
> In October 2016, with the release of Linux kernel version 4.8, the kernel's /dev/urandom was switched over to a ChaCha20-based cryptographic pseudorandom number generator (CPRNG) implementation [16] by Theodore Ts'o, based on Bernstein's well-regarded stream cipher ChaCha20.
> In 2020, the Linux kernel version 5.6 /dev/random only blocks when the CPRNG hasn't initialized. Once initialized, /dev/random and /dev/urandom behave the same. [17]
So yes, Linux can use hardware RNGs. Your second question probably is better stated as whether those can generate random bits at a sufficient rate. I would expect hardware RNGs of being capable of that for typical use cases.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...
not sure that's used for /dev/random/ but I thought it was, but the question is when because early in the boot process that may not be loaded. There was also a general distrust of vendor specific hardware randoms in the past IIRC.
dd if=/dev/random iflag=fullblock of=file.bin bs=1024 count=1024 status=progress
It's easy to say why not: backwards compatibility and compatibility on systems without a good entropy source.
Backwards compatibility because now any application that depended on non-blocking randomness in early boot is SOL, and just because it's hard to find an example doesn't mean no system will be affected by this.
Systems without entropy because, because you have no guarantee from the hardware that (1) this jitter technique will work or (2) it will be secure. Putting (2) aside as "theoretical concerns", does this mean if I run Linux on a fully deterministic emulator the ext4 bug will lock up my system? That seems bad, why does my OS need to have entropy in the first place?
Because both these files come with preconceived notions from various stages in the life of Unix regarding what guarantees they provide, and "best effort" works for neither of those.
If this change had come in, say, 2005, they maybe could have gotten away with /dev/random = blocked until initialized and /dev/urandom = best effort, since that was the common wisdom at the time. But for the last 10 years, more and more people have switched to using /dev/urandom for everything since it is actually good enough for everything once initialized (and most application devs only care about the "once initialized" phase since they don't work on early-boot stuff). Switching /dev/urandom to GRND_INSECURE now would therefore be a potentially bad idea.
It just feels like the argument for this change is, this is irrelevant for 99.99% of applications, so who cares? The 0.01% care!
EDIT:
> Switching /dev/urandom to GRND_INSECURE now would therefore be a potentially bad idea
And, again, maybe I'm misunderstanding. The Jason Donenfeld email seems to say this is effectively the behavior we have. Ie, no guarantees of "initialization" or "sufficient entropy" on the urandom device.
Just want to point out that the Linus Jitter Dance is already in use today. It's been there for three years. I had nothing to do with that change. The change that I'm now proposing, which this article is about, changes nothing about the Linus Jitter Dance. Whether you like it or not, it's being used already, and has been for three years now, affecting all interfaces to the rng.
I only mention it in my patch, for the sole purpose of indicating that blocking in /dev/urandom has been unproblematic for three years now, because it will unblock a second later. That's the only at all reason why I mention the Linus Jitter Dance.
The only purpose of the patch is to make /dev/urandom block.
> I also think you can still have specific reservations about how the kernel 'shepherds' its pool of random bits. [...] It would seem best if the kernel used a cryptographic algorithm
Actually, it will do this for 5.18, authored a few weeks ago: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random....
That is decidedly not true. /dev/urandom is not guaranteed to be secure upon boot before enough entropy is gathered by the system, but it is guaranteed to not block indefinitely. The patch changes this contract by making /dev/urandom guaranteed secure and maybe block indefinitely if some unlikely edge case is encountered.
1. the CSPRNG in Linux is secure, and
2. CSPRNGs in general exists ?
Fixing #1 simply requires changing to another algorithm.
Fixing #2 requires a secure RNG to block for entropy, and if the distinction between /dev/random and /dev/urandom goes away, then this scenario will cause problems _if_ it happens. I said it's very unlikely, but I don't think I should get this uncharitable response by pointing out the issue.
And here is discussion of the concept: https://news.ycombinator.com/item?id=9512718
I don't think trying to spin this as "OpenBSD solved this years ago" is especially helpful. OpenBSD has made a different set of design tradeoffs than the Linux authors, and both are arguably reasonable designs.
Anyone can write to /dev/random - this mixes data into the entrophy pool but it won't be "credited" as securely increasing /proc/sys/kernel/random/entropy_avail . https://www.whonix.org/wiki/Dev/Entropy
Similarly, systemd-boot can seed from disk but will not "credit" entrophy. https://systemd.io/RANDOM_SEEDS/
If the point of /dev/random is to provide crytographically secure random numbers, then some level of paranoia is needed for determining which sources are "credited" for initializing the pool. https://lwn.net/Articles/760121/
In some security contexts this could be a significant concern.
But the read-only filesystem issue is something that could happen by accident rather than malicious alteration - for instance some filesystem errors may result in it being mounted RO for safety until the corruption is addressed.
There have been too many examples of seed files being reused, it's time to recognize that requiring a unique seed file is not good property for an RNG to have.
The first file, /etc/random.seed is 512 bytes and is available very early as it's on the root filesystem. This file is re-written by rc(8) at every boot, halt, shutdown, and reboot.
Second, /var/db/host.random is 65536 bytes. It is also re-written by rc at every boot, halt, shutdown and reboot.
In addition to all that, rc includes:
# If bootblocks failed to give us random, try to cause some churn
(dmesg; sysctl hw.{uuid,serialno,sensors} ) >/dev/random 2>&1
I just checked my VMs and they all print unique values for dmesg, hw.uuid and hw.serialno. I can guess but I don't know how hw.uuid and hw.serialno are set.