Linux /dev/urandom and concurrency

Linux /dev/urandom and concurrency(drsnyder.us)

90 points by drsnyder 12 years ago | 78 comments

kijin 12 years ago |

If your program needs to read 4K from /dev/urandom multiple times per second, you're doing it wrong. There is little benefit in reading anything over 32 bytes at a time.

According to the man page for /dev/random and /dev/urandom:

> no cryptographic primitive available today can hope to promise more than 256 bits of security, so if any program reads more than 256 bits (32 bytes) from the kernel random pool per invocation, or per reasonable reseed interval (not less than one minute), that should be taken as a sign that its cryptography is not skillfully implemented.

Glyptodon 12 years ago |

So why is there a lock for reads from urandom? I suppose if there weren't a lock concurrent reads would all get the same random values?

bodyfour 12 years ago | |

Yeah basically. That could be a disaster for, say, nonce generation.

The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.

acqq 12 years ago | | |

Yes, if there is no a urandom generator per core, it would be convenient for some extreme cases to introduce such. The question is if it's worth the effort and the resulted "bloat" of the kernel code and memory usage. Linux runs on some very small devices too and even there decent user-space programmers can easily do their own per-thread generation in their programs. Normal uses of crypto are such: you initialize your own crypto once, then produce a lot of data in your own space.

If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?

drsnyder 12 years ago | |

Good question. The only reference to it that I could find was here http://lkml.iu.edu//hypermail/linux/kernel/0412.1/0181.html but he doesn't explain why it's necessary.

gizmo686 12 years ago | | |

From the mail:

>This patch solves a problem where simultaneous reads to /dev/urandom can cause two processes on different processors to get the same value. We're not using a spinlock around the random generation loop because this will be a huge hit to preempt latency. So instead we just use a mutex around random_read and urandom_read. Yeah, it's not as efficient in the case of contention, if an application is calling /dev/urandom a huge amount, it's there's something really misdesigned with it, and we don't want to optimize for stupid applications.

sebcat 12 years ago |

As a user of libcares (which is awesome for bulk DNS lookups btw) I'll add that I've only ever needed one ares_channel per process. Having one ares_channel for every CURL-handle seems a bit excessive. This is probably the main problem here, not the kernel spinlock.

Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a new CURL-handle is inited for every request, which I don't recall being necessary.

drsnyder 12 years ago | |

The curl handle should be re-used if possible so that's also part of the problem.

Mister_Snuggles 12 years ago |

A more important question would be "Why does asynchronous DNS resolution require random data in the first place?"

mike-cardwell 12 years ago | |

So you can randomise the ID in the request packet to help protect against cache poisoning. And also so you can apply 0x20 bit (x) encoding to the qname for further protection.

(x) http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...

bch 12 years ago | | |

Hard to say w/o seeing the data in question, but based on that, perhaps nscd or re-using curl handles could mitigate their frustration w/ runtime.

TazeTSchnitzel 12 years ago | |

IDs of requests?

shachar 12 years ago | |

choosing random UDP source port

aidenn0 12 years ago |

Seed a secure userspace PRNG from urandom, perhaps?

hosay123 12 years ago | |

Adding to aidenn0's comment, if you trust /dev/urandom to produce 4kb of random data, it follows that you trust it to produce 128 bits.

128 bits (32 bytes) is sufficient to initialize a PRNG into any one of 115792089237316195423570985008687907853269984665640564039457584007913129639936 states (that's 1 with 77 digits). Consequently, hitting the kernel constantly for so much data is utterly inefficient in the first instance, and totally unnecessary in the second.

Blog author could improve his design's efficiency >128x just by seeding a PRNG with a single 32 byte read at the start of the subprocess

mcpherrinm 12 years ago | | |

Userland PRNGs are one of the easiest ways to introduce security vulnerabilities into your programs. I would recommend being very, VERY careful before trying to do this, like the traditional "Don't roll your own crypto" advice.

tptacek 12 years ago | |

If you care about security, avoid this approach; it creates an additional single point of failure, which historically has also tended to be a very likely point of failure (see: Debian randomness, Android Java SecureRandom, &c).

mike-cardwell 12 years ago |

Interestingly enough, I have actually been working on writing a DNS client library in C++ with Boost ASIO this very afternoon. I was going to get my source of random data using the following C++11 standard library code. I would really appreciate any comments from people here if there is anything wrong with what I'm doing:

  #include <random>
  std::uniform_int_distribution<uint32_t> dist;

  // Seed a Mersenne twister PRNG with random data:
  std::mt19937 eng;
  std::random_device rd;
  eng.seed(dist(rd));

  // Now to generate random numbers, simply:
  uint32_t random_number = dist(eng);

aidenn0 12 years ago | |

I don't know what DNS uses the randomness for, but if a malicious attacker can gain from guessing the randomness, don't use MT, as the state can be extracted from MT by observing a relatively small number of outputs.

mike-cardwell 12 years ago | | |

Ah. You appear to be right. I'm glad I asked now.

[edit] I'm going to skip using the Mersenne twister engine and just use std::random_device for all random data, instead of as a seed. It seems on Linux at least that random_device is basically /dev/urandom. I assume the source will be sane on other OS's too.

akira2501 12 years ago | | |

I believe you would use it to determine a random outgoing port to use to contact the DNS server; this prevents spoofing. However, the port space is only 16-bits, so how you map the outputs of the MT into that space would have the biggest impact -- but you're right that's it's probably best to avoid it entirely.

en4bz 12 years ago | |

    std::random_device rd;
    std::mt19937 rng(rd()); //Construct with random seed. 
    uint32_t random_number = dist(rng);

Since only the seed value comes from `rd` you should be fine if you suspected the results from the article would affect you. What was most likely happening in the article was constant use of `rd` without a prng.

rcoh 12 years ago |

The stdlib rand() function on unix has a global lock around it provided by many versions of Linux. As such, if rand() is called in performance critical parallel code, performance will tank as each thread or process attempts to acquire this lock. Even if this lock is not acquired, you will still have a race condition on the state of the random number generator and may produce bad (non-random) randomness.

Use rand_r(unsigned int *state) instead in parallel and concurrent applications.

Sources: man 3 rand [unix command] http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3

ekimekim 12 years ago | |

The problem you're describing is similar but not the same as the one in the article. What you describe is part of the libc implementation of rand(3), whereas the article is talking about reads from /dev/urandom, which has a lock inside the kernel code (for the same reasons as libc).

X-Istence 12 years ago |

I love how Theodore Ts'o suggests using a user space PRNG that is seeded from /dev/urandom. OpenBSD are ripping out all of the user space PRNG stuff from OpenSSL in favour of arc4random_buf()...

clarry 12 years ago | |

arc4random_buf() operates in userspace (in this case; it also exists in the kernel). It is seeded from the kernel, using a sysctl.

ape4 12 years ago |

Why does he need so much pseudorandomness. And why use /dev/urandom directly. Maybe using the random library from the programming environment would make more sense.

frankfarmer 12 years ago | |

Simply initializing a curl handle causes the /dev/urandom read -- so a large number of parallel curl requests easily triggers this issue.

ape4 12 years ago | | |

Thanks for the reply.

rafekett 12 years ago |

overreliance on /dev/urandom in the presence of little entropy is a well known performance problem on servers. that's why http://en.wikipedia.org/wiki/Hardware_random_number_generato... exist

claudius 12 years ago | |

If I understand that problem correctly, it has nothing to do with the amount of entropy available but is a simple synchronisation/locking issue. Were reads from, say, /dev/zero ‘protected’ by spinlocks in the same way, the same issue would arise. Conversely, I don’t see how adding a hardware RNG to the system could alleviate the locking issue.

kevingadd 12 years ago | |

A hardware RNG isn't going to do anything to address the scalability problems inherent in having a single shared lock around /dev/urandom.

jerf 12 years ago | |

/dev/urandom is not /dev/random.

bcl 12 years ago |

The code he pointed to is for kernel 2.6.18 which at this point could be considered ancient history. If you look at current master - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

it looks like it has been re-factored somewhat, although the lock is still in there.