Uniting the Linux random-number devices

Uniting the Linux random-number devices(lwn.net)

125 points by h1x 4 years ago | 61 comments

zx2c4 4 years ago |

I just sent a v1 of this patch: https://lore.kernel.org/lkml/20220217162848.303601-1-Jason@z...

We'll see if that elicits any real objections. Hopefully not, and this will be part of 5.18!

an_d_rew 4 years ago | |

Jason, you buried the lede! :-)

Very nicely put, and thank you for putting that together!

This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated.

With very minimal downside, we're finally in a position where we can make this change.

stingraycharles 4 years ago | |

Seems like a no-brainer to me; as I understand, this effectively make urandom “always secure”, which is a very good thing; makes for an even more convincing argument when some colleague insists on using /dev/random “because it’s more secure”.

Do I understand correctly that with this patch, the only drawback is for weird architectures that do not have an instruction counter or another instruction to gather random data? And in those cases, the old behavior (“urandom may be insecure shortly after boot”) may even be worse than the new behavior (“may block while waiting for entropy”) ?

zx2c4 4 years ago | | |

Right, it unifies /dev/urandom, /dev/random, and getrandom(flags=0) to all do exactly the same thing.

Most modern userspaces already use getrandom(flags=0). Nothing changes for them. They already count on the rng being seeded in one way or another.

Rather, this changes /dev/urandom, which previously would give insecure randomness before being seeded. With this change, this doesn't happen any more, because it makes /dev/urandom wait until it has been seeded.

In practice, the RNG get seeded by a large variety of things. As a last ditch effort, the Linus Jitter Dance will seed it.

Taken together, what all the above amounts to is that the regression potential is limited to systems where: (A) /dev/urandom is still being used, rather than getrandom(flags=0), (B) the boot sequence, due to some bug, hard-depends on unseeded reads from /dev/urandom, (C) no ordinary sources of entropy, such as interrupts and input devices and disk drives, are available, (D) the CPU is so ancient as to be missing a cycle counter, defeating the last ditch Linus Jitter Dance, and (E) a new kernel will be installed on this old system.

I argue that the set of machines where (A), (B), (C), (D), and (E) all hold is minuscule.

throway_zwudbo 4 years ago | |

This is probably way outside of my sphere of competence, but....

If your approach is adopted, people would simply treat /dev/random and /dev/urandom as the same thing (which I gather is your intended goal). That is fine as long as CSPRNGs are relatively easy to make. I hear that this hinges on fancy theorems like P=BPP being true, but apparently they're not proven yet.

What if... in some parallel universe it turns out that P!=BPP, and the concept of CSPRNGs is fundamentally broken, and somebody discovers a practical method to break whatever PRNG is implemented in a system? In this admittedly unlikely universe, keeping the distinction between /dev/random and /dev/urandom (i.e. the former could block indefinitely, the latter could be insecure) seems to be the safer approach. Of course in this universe, Linux would still have to pull the PRNG from /dev/random and revert back to the old behavior, but at least it's fixable. But if userland drops the distinction between /dev/random and /dev/urandom, then the problem would be fundamentally unfixable until every app reviews and decides which guarantee they want for themselves (and releases patches).

Of course your patch does not really imply the contract between the kernel and userland has changed, which is why I mentioned intention. If it is intended to change the contract, maybe it's better to wait for P=BPP before you do it? :P

jcranmer 4 years ago | | |

You're already misunderstanding how /dev/random and /dev/urandom work today, or indeed ever worked. Both devices have always read from the output of a CSPRNG.

It used to be that /dev/random did some accounting to try to estimate how much entropy was in its pool, and if that number declined too low (as reading from /dev/random was figured to decrease the entropy), it would simply refuse to run its CSPRNG to produce any more output until it get fed some more entropy. This accounting was heavily criticized for being magic thinking and unsupported by any actual research, and revisions to the randomness engine in Linux over the past decade have eventually eliminated this entropy accounting in favor of just tracking how much has ever been added--if there's not enough, then it blocks until there is.

DenseComet 4 years ago | | |

Applications trust /dev/urandom to be secure. If your scenario ends up being true, then instead of /dev/random acting like /dev/urandom, /dev/urandom should act like /dev/random since it is supposed to be secure, and we're back to having no distinction between the devices.

GoblinSlayer 4 years ago | | |

If CSPRNGs are impossible, then you will have no CSPRNGs so both /dev/random and /dev/urandom will be insecure and will provide no guarantees solely due to impossibility of CSPRNGs.

jepler 4 years ago |

As far as I can tell, there's little to nothing objectionable about this change; it makes urandom behave _more like_ random, by not yielding bytes before the kernel's entropy pool is in a good state (GRND_INSECURE).

Systems where this would make urandom block for an objectionably long time (because CPU execution time jitter is unavailable or is believed to have low entropy) are largely hypothetical.

I think you can still have specific reservations about CPU execution time jitter, though my experience and reading makes me believe this is probably a pretty good source of entropy; personally, I feel the ball is firmly in the court of jitter skeptics to show why the entropy measures from actual running systems are wildly high estimates. [https://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-A...]

I also think you can still have specific reservations about how the kernel 'shepherds' its pool of random bits. I honestly am out of touch with the latest algorithms, both in research and in the Linux kernel. It would seem best if the kernel used a cryptographic algorithm where inferring the hidden random pool's state from outputs implies a useful attack on the cryptographic algorithm itself (i.e., has a proof of security). I don't think that Linux does this at the moment, based on recent discussion at https://lwn.net/ml/linux-kernel/20220201161342.154666-1-Jaso...

But let's take a moment to happily reflect: for applications, running well after system boot-up has completed, it is now soooo easy to have your fill of cryptographic-quality random numbers than it was in the bad old days.

kroeckx 4 years ago |

I'm still not really convinced about how much entropy is collected by the jitter entropy technique. I've been looking at https://github.com/smuellerDD/jitterentropy-library previously, which is for instance used by OpenWRT. It's hard to do a proper estimation of the entropy, because depending on how you measure it, you get different results. The library has been changed, but it's probably still overestimating the entropy.

EdSchouten 4 years ago |

What I never understood is why the entire discussion revolves around the concept that the kernel gets launched in a vacuum where it is responsible for generating its own entropy. Why can’t the kernel (or the boot loader) restore/reload state from somewhere? I don’t buy the argument that it’s hard to build that.

- For virtualized systems there could be a hypervisor call to request an initial random seed.

- Run of the mill desktops/servers could use a reserved region (partition or boot record field) for preserving RNG state.

- On embedded devices a boot loader like uBoot could load state from some piece of NVRAM/NAND/…

There are hardly any platforms out there that are stateless in the literal sense. All of those approaches above would allow Linux to have a properly seeded RNG from the very first instruction that runs. No need to fully rely dubious things like timing execution/interrupts.

derobert 4 years ago | |

A lot of these things do exist. Desktop/server Linux systems (used to at least) save some output from the PRNG to disk on shutdown and load it back on boot.

But of course snapshots, cloning, etc. can foil that badly, causing the same seed to be used multiple times. And on initial install you're not going to have any of that (but initial install is also when you may need to generate long-lived random numbers like ssh host keys).

Embedded devices it can be a real challenge. You must not re-use the seed data, so you effectively have to erase it from NVRAM/flash before use. But then if you lose power before you can generate a new one, you won't have one next boot. And you're adding flash writes, which decreases longevity and increases the chance of power failure in the middle of a write.

Qemu/KVM has a virtual RNG so you can feed host randomness into the guests if you want. So there are hypervisor calls available.

eternityforest 4 years ago | |

Some applications really need to be stateless. What if you don't have wear leveling because you're on some bizzare embedded thing?

You could have a one-time written random seed though, which would be enough when combined with an RTC, and would probably still be enough when combined with the almost universally present HWRNGs built into everything now.

RTCs themselves nearly always seem to have about 56 bytes of memory, but I'd hate to see half of that used just for RTC, I'd rather it be exposed to applications via the FS somehow.

But, it could be enough for a 64 bit counter, which would be combined with a one time written secret.

daneel_w 4 years ago |

OpenBSD solved this problem long ago. Is the reason behind Linux' hesitancy/opposition to their solution of technical or philosophical nature?

pdw 4 years ago | |

Linux is used in many more contexts than OpenBSD. Moreover, unlike OpenBSD, Linux explicitly promises not to break userspace on new releases. So yeah, the Linux devs are more hesitant about anything that might be a user-visible behavior change.

daneel_w 4 years ago | | |

OpenBSD never removed /dev/random - though they did remove arandom and prandom. But I'm not sure how your response explains the reasons for why they avoided OpenBSD's solution.

majewsky 4 years ago | |

How did they solve it?

daneel_w 4 years ago | | |

The bootloader seeds the kernel from disk, the kernel continually mixes data into the entropy pool from various sources. arc4random(), which provides for kernel/userspace/devices (including /dev/random which is symlinked to urandom), can never block.

Add.: the seed file is unique per installation, and is also updated continously by the system.

benchaney 4 years ago | |

I had heard that the concern was that there are services very early in boot that rely on /dev/urandom never blocking. I’m not sure how true that is, or why it is no longer a concern now.

westurner 4 years ago |

https://en.wikipedia.org/wiki//dev/random#Linux has :

> In October 2016, with the release of Linux kernel version 4.8, the kernel's /dev/urandom was switched over to a ChaCha20-based cryptographic pseudorandom number generator (CPRNG) implementation [16] by Theodore Ts'o, based on Bernstein's well-regarded stream cipher ChaCha20.

> In 2020, the Linux kernel version 5.6 /dev/random only blocks when the CPRNG hasn't initialized. Once initialized, /dev/random and /dev/urandom behave the same. [17]

Dork1234 4 years ago |

Can Linux use Hardware RNG devices? Are these devices able to generate enough bits of randomness to work for boot?

Someone 4 years ago | |

FTA: That entropy comes from sources like interrupt timing for various kinds of devices (e.g. disk, keyboard, network) and hardware RNGs if they are available.

So yes, Linux can use hardware RNGs. Your second question probably is better stated as whether those can generate random bits at a sufficient rate. I would expect hardware RNGs of being capable of that for typical use cases.

zx2c4 4 years ago | |

Yes. This happens via random.c's add_hwgenerator_randomness() hook, which the hwrng framework calls from a kthread.

leeter 4 years ago | |

Appears so?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

not sure that's used for /dev/random/ but I thought it was, but the question is when because early in the boot process that may not be loaded. There was also a general distrust of vendor specific hardware randoms in the past IIRC.

neatze 4 years ago | |

You can run rng-tools to feed randomness from dedicated hardware like OneRNG or TrueRNG, it is substantially slower then /dev/urandom.

goalieca 4 years ago | |

I do wonder if _fast_ quantum rng sources will become ubiquitous outside mobile applications.

adgjlsfhk1 4 years ago | | |

I doubt it. even if algorithms like sha get almost totally broken, you could get away with injecting a tiny number of bits of true randomness (like 1 in 2^20) and the result will be uncrackable.

neatze 4 years ago |

What would be performance with proposed changes when running command below ?

dd if=/dev/random iflag=fullblock of=file.bin bs=1024 count=1024 status=progress

zx2c4 4 years ago | |

No changes. Totally unrelated.

neatze 4 years ago | | |

I meant /dev/urandom, sorry.

Do you mean it is in reverse, in sense what getrandom() would be using ?

tinalumfoil 4 years ago |

The decision doesn't really make sense to me, and maybe I'm misunderstanding. So please correct me if I'm wrong. If the kernel aims to have 2 ways to get randomness: blocked until initialized (default) and best effort (ie GRND_INSECURE), and there's two device files why not map one to one and one to the other?

It's easy to say why not: backwards compatibility and compatibility on systems without a good entropy source.

Backwards compatibility because now any application that depended on non-blocking randomness in early boot is SOL, and just because it's hard to find an example doesn't mean no system will be affected by this.

Systems without entropy because, because you have no guarantee from the hardware that (1) this jitter technique will work or (2) it will be secure. Putting (2) aside as "theoretical concerns", does this mean if I run Linux on a fully deterministic emulator the ext4 bug will lock up my system? That seems bad, why does my OS need to have entropy in the first place?

majewsky 4 years ago | |

> If the kernel aims to have 2 ways to get randomness [...] and there's two device files why not map one to one and one to the other?

Because both these files come with preconceived notions from various stages in the life of Unix regarding what guarantees they provide, and "best effort" works for neither of those.

If this change had come in, say, 2005, they maybe could have gotten away with /dev/random = blocked until initialized and /dev/urandom = best effort, since that was the common wisdom at the time. But for the last 10 years, more and more people have switched to using /dev/urandom for everything since it is actually good enough for everything once initialized (and most application devs only care about the "once initialized" phase since they don't work on early-boot stuff). Switching /dev/urandom to GRND_INSECURE now would therefore be a potentially bad idea.

tinalumfoil 4 years ago | | |

But saying /dev/urandom is best effort doesn't change those expectations, in fact it keeps those expectations the same. Saying most apps, "don't work on early-boot stuff" so aren't affected doesn't mean we should risk breaking systems who will inevitably have software that is going to run during early boot.

It just feels like the argument for this change is, this is irrelevant for 99.99% of applications, so who cares? The 0.01% care!

EDIT:

> Switching /dev/urandom to GRND_INSECURE now would therefore be a potentially bad idea

And, again, maybe I'm misunderstanding. The Jason Donenfeld email seems to say this is effectively the behavior we have. Ie, no guarantees of "initialization" or "sufficient entropy" on the urandom device.