Linux /dev/urandom and concurrency(drsnyder.us) |
Linux /dev/urandom and concurrency(drsnyder.us) |
According to the man page for /dev/random and /dev/urandom:
> no cryptographic primitive available today can hope to promise more than 256 bits of security, so if any program reads more than 256 bits (32 bytes) from the kernel random pool per invocation, or per reasonable reseed interval (not less than one minute), that should be taken as a sign that its cryptography is not skillfully implemented.
The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.
If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?
>This patch solves a problem where simultaneous reads to /dev/urandom can cause two processes on different processors to get the same value. We're not using a spinlock around the random generation loop because this will be a huge hit to preempt latency. So instead we just use a mutex around random_read and urandom_read. Yeah, it's not as efficient in the case of contention, if an application is calling /dev/urandom a huge amount, it's there's something really misdesigned with it, and we don't want to optimize for stupid applications.
Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a new CURL-handle is inited for every request, which I don't recall being necessary.
(x) http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...
128 bits (32 bytes) is sufficient to initialize a PRNG into any one of 115792089237316195423570985008687907853269984665640564039457584007913129639936 states (that's 1 with 77 digits). Consequently, hitting the kernel constantly for so much data is utterly inefficient in the first instance, and totally unnecessary in the second.
Blog author could improve his design's efficiency >128x just by seeding a PRNG with a single 32 byte read at the start of the subprocess
#include <random>
std::uniform_int_distribution<uint32_t> dist;
// Seed a Mersenne twister PRNG with random data:
std::mt19937 eng;
std::random_device rd;
eng.seed(dist(rd));
// Now to generate random numbers, simply:
uint32_t random_number = dist(eng);[edit] I'm going to skip using the Mersenne twister engine and just use std::random_device for all random data, instead of as a seed. It seems on Linux at least that random_device is basically /dev/urandom. I assume the source will be sane on other OS's too.
std::random_device rd;
std::mt19937 rng(rd()); //Construct with random seed.
uint32_t random_number = dist(rng);
Since only the seed value comes from `rd` you should be fine if you suspected the results from the article would affect you. What was most likely happening in the article was constant use of `rd` without a prng.Use rand_r(unsigned int *state) instead in parallel and concurrent applications.
Sources: man 3 rand [unix command] http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3
it looks like it has been re-factored somewhat, although the lock is still in there.
$ time dd if=/dev/urandom of=/dev/null bs=1 count=10000000
real 0m10.640s
user 0m0.696s
sys 0m9.940s
$ time (for i in $(seq 1 50); do dd if=/dev/urandom of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
real 0m11.199s
user 0m1.232s
sys 0m42.828s
$ time (for i in $(seq 1 500); do dd if=/dev/urandom of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
real 0m11.234s
user 0m1.252s
sys 0m42.536s
whereas for /dev/zero: $ time dd if=/dev/zero of=/dev/null bs=1 count=10000000
real 0m3.268s
user 0m0.660s
sys 0m2.604s
$ time (for i in $(seq 1 50); do dd if=/dev/zero of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
real 0m2.550s
user 0m1.192s
sys 0m8.760s
$ time (for i in $(seq 1 500); do dd if=/dev/zero of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
real 0m2.612s
user 0m1.228s
sys 0m8.112s
Of course, the bash for-loop here together with the forking has some considerable overhead, so these values should likely be interpreted carefully (Linux 3.14-rc7, Core i5 520M).Re: "limit calls to curl init" -- do you mean curl_easy_init() ? In that case, reusing handles (eg:
CURL *handle
) would mitigate that, no?Edit: This doesn't make sense. c-ares relationship must be in curl_easy_perform(). Now I'm curious:
1) Am I correct re: c-ares / curl_easy_perform()
2) Can one reuse CURL *handle and not invoke c-ares and /dev/urandom if one reuses the same domain name (but not necessarily the same URL) within a handle.http://curl.haxx.se/libcurl/c/curl_easy_init.html
"If you did not already call curl_global_init(3), curl_easy_init(3) does it automatically. This may be lethal in multi-threaded cases, since curl_global_init(3) is not thread-safe, and it may result in resource problems because there is no corresponding cleanup."
I can imagine that only curl_global_init reads from urandom? Your application should do curl_global_init only once, then do other fetches each time using just curl_easy_init and cleanup.
Here's the stacktrace for the /dev/urandom read -- it's happening in curl_easy_init.
Catchpoint 1 (call to syscall 'ioctl'), 0x0000003a74ecc4ba in tcgetattr () from /lib64/libc.so.6
(gdb) backtrace
#0 0x0000003a74ecc4ba in tcgetattr () from /lib64/libc.so.6
#1 0x0000003a74ec7a1c in isatty () from /lib64/libc.so.6
#2 0x0000003a74e60d51 in _IO_file_doallocate_internal () from /lib64/libc.so.6
#3 0x0000003a74e6d6dc in _IO_doallocbuf_internal () from /lib64/libc.so.6
#4 0x0000003a74e6ba7c in _IO_file_xsgetn_internal () from /lib64/libc.so.6
#5 0x0000003a74e61dd2 in fread () from /lib64/libc.so.6
#6 0x0000003341606414 in ares_init_options () from /usr/lib64/libcares.so.2
#7 0x0000003d3404f0c9 in ?? () from /usr/lib64/libcurl.so.4
#8 0x0000003d340242a5 in ?? () from /usr/lib64/libcurl.so.4
#9 0x0000003d3402f9a6 in curl_easy_init () from /usr/lib64/libcurl.so.4
#10 0x00002b35e0304fb0 in ?? () from /usr/lib64/php/modules/curl.so
#11 0x0000000000606da9 in ?? ()
#12 0x00000000006456b8 in execute_ex ()
#13 0x00000000005d2bba in zend_execute_scripts ()
#14 0x00000000005769ee in php_execute_script ()
#15 0x000000000067e44d in ?? ()
#16 0x000000000067ede8 in ?? ()
#17 0x0000003a74e1d994 in __libc_start_main () from /lib64/libc.so.6
#18 0x0000000000422b09 in _start ()
(gdb) if(initialized++)
return CURLE_OK;
Correct?boost::random_device, on the other hand, has better guarantees: it is only implemented where there's a decent entropy source.
If you need secure random numbers, do what you had above (though in light of other comments, perhaps consider a different algo besides MT).
It's 2014. There are well-funded governments and organized crime attacking our systems. If downstream developers still have to ask the question, "what kind of random numbers does this API provide?", then it's a bug in the platform.
We have the abillity to make the /dev/urandom CSPRNG secure enough and fast enough for (almost) any randomness purpose. We need to cut all the rest of this insane crap.
People choose the wrong RNGs and get burned, or wont use the right ones because of speed or imaginary entropy exhaustion issues. This matters.
The problem comes with multiple processes competing for the lock.
One preson may have multiple processes reading from /dev/urandom.
And why curl at all? PHP has built in HTTPRequest?
2 - You don't need the crypto qualities of it and you're emptying the entropy pool for nothing
3 - You're doing much more work, especially if you're reading one byte at a time from /dev/urandom (doing a syscall, etc), while rand is just a calculation
For 2, entropy pool depletion is a fictitious problem if you're worried about security. Some discussion here:
https://news.ycombinator.com/item?id=7361694
If you're worried about blocking apps that use /dev/random, the answer there is to fix them to use /dev/urandom so they don't block.
The guy uses PHP and instead of built-in HTTPRequest he uses curl to make a request to "a bucketed key-value store built on PostgreSQL that speaks HTTP which uses Clojure and the Compojure web framework to provide a REST interface over HTTP." A bit of shooting the flies with cannons on every side?
On another side, if it can be proved that urandom has serious problems in reasonable use cases it should be checked what can be changed and how.
> The guy uses PHP and instead of built-in HTTPRequest he uses curl to make a request
HTTPRequest is not built-in to PHP. It is a PECL extension that is usually installed separately from PHP.
Curl is more built-in to PHP - it's a PHP compile-time flag, and it is distributed with PHP source.
We're not nearly at the theoretical limit of what /dev/urandom can provide.
In the current security environment, from heartbleed to the NSA, it's becoming clear that security issues need to be systematically dealt with from an industry perspective or people will start to lose faith in secure Internet communication, which would undermine too much of what's valuable about the Internet.
What we need is great APIs/frameworks/design patterns to simplify cryptography so that a newbie ruby on rails programmer CAN create actually secure applications and not even realize that it was complicated in the first place.
In cryptography you make one misstep and the entire chain is broken. It's thus important for things like the linux kernel to provide great implementations so that people don't think twice about using it and never want their own PRNG.
It seems the author admits in the comments: "All the application needs to do is open a socket and generate a GET request." So why complaining about the kernel?
If there's problem with urandom, demonstrate it on the reasonable use case example, don't try to impress anybody by showing how much different libraries, modules and programs you combine for one key-value query.
There's no reason for the kernel to provide standard library functions. In fact, I'd argue that syscalls should be reserved for only actions that cannot be done wholly in userspace (futex is a good example of this). The current model of "hardware randomness to seed a PRNG" makes sense. It is up to the userspace libraries to provide good implementations.
Similarly there's a need for great "APIs/frameworks/design patterns" for what the kernel doesn't provide. I predict over the next 5 years this will become a far bigger priority in how people develop software and thus use libraries.