Making Python Less Random

Making Python Less Random(healeycodes.com)

87 points by healeycodes 1 year ago | 43 comments

Another way to do this that covers more sources of non-determinism would be to run your python code under Meta’s Hermit: https://developers.facebook.com/blog/post/2022/11/22/hermit-...

amnqrt 1 year ago | |

Well, I never had issues with finding threading bugs on a normal Linux. The flakiness in Meta software tests, at least of the open source published kind, is that the code bases are a mess and are rewritten every two weeks, because apparently LOC is the measure of success.

nbadg 1 year ago |

I'm... confused. Being able to intercept and modify syscalls is a neat trick, but why is it applicable here?

In python you generally have two kinds of randomness: cryptographically-secure randomness, and pseudorandomness. The general recommendation is: if you need a CSRNG, use ``os.urandom`` -- or, more recently, the stdlib ``secrets`` module. But if it doesn't need to be cryptographically secure, you should use the stdlib ``random`` module.

The thing is, the ``random`` module gives you the ability to seed and re-seed the underlying PRNG state machine. You can even create your own instances of the PRNG state machine, if you want to isolate yourself from other libraries, and then you can seed or reseed that state machine at will without affecting anything else. So for pseudorandom "randomness", the stdlib already exposes a purpose-built function that does exactly what the OP needs. Also, within individual tests, it's perfectly possible to monkeypatch the root PRNG in the random module with your own temporary copy, modify the seed, etc, so you can even make this work on a per-test basis, using completely bog-standard python, no special sauce required. Well-written libraries even expose this as a primitive for dependency injection, so that you can have direct control over the PRNG.

Meanwhile, for applications that require CSRNG... you really shouldn't be writing code that is testing for a deterministic result. At least in my experience, assuming you aren't testing the implementation of cryptographic primitives, there are always better strategies -- things like round-trip tests, for example.

So... are the 3rd-party deps just "misbehaving" and calling ``os.urandom`` for no reason? Does the OP author not know about ``random.seed``? Does the author want to avoid monkeypatches in tests (which are completely standard practice in python)? Is there something else going on entirely? Intercepting syscalls to get deterministic randomness in python really feels like bringing an atom bomb to a game of fingerguns.

MrJohz 1 year ago | |

The article makes it fairly clear they this is mainly a kind of nerd-sniping - there are better solutions for practical purposes, but the author wanted to explore a different approach and learn a bit about syscall interception along the way.

nick238 1 year ago | |

If you're developing a game, there's a fairly big issue in that many things may be requesting values from, and thus incrementing, the PRNG, and many of them could be indirectly controlled by the user (where they are, where they're looking, etc. https://www.youtube.com/watch?v=1hs451PfFzQ is a fun video about reverse-engineering Zelda to predict the randomness in a minigame)

As far as the approach, I agree in that I don't understand why 'no code changes' is that important, especially in the context of Python which has a general attitude of consent towards monkeypatching code. Maybe one of the randomness sources was hashing all the source files? :P

kstrauser 1 year ago | | |

Python has perhaps the least tolerant culture toward monkeypatching of languages that are capable of it. Outside a couple well-known common cases (gevent, I think?) it’s widely frowned upon.

beoberha 1 year ago | | |

That video was awesome!

lastrxa 1 year ago | |

To be fair to the OP, the implicitness in Python in general and the random seeding in particular is confusing, especially if 3rd party modules are involved.

In C++, if you use std::mt19937, everything from seeding to the explicit generator is crystal clear while being terse as well.

red_admiral 1 year ago |

Maybe I'm missing something, but if you can set os.urandom to a custom function, why not implement your own stateful PRNG in python and patch urandom to point to that? Then you can, among other things, seed the PRNG yourself in unit tests, all from within python and without touching syscalls.

Neywiny 1 year ago |

Python randomness is something I've fought with for a few years. A while back (it's on my GitHub, I can find it if any replies care) I had an issue with something about distributed monte carlo sims all ending up with the same seed or something. More recently I've had an issue that I wanted a large number of random bytes but generated the same across multiple programs. Thinking about it now I could have used an LFSR or similar, but I just seeded the random module and it went fine.

Editing to add that another thing that trips me to every few years is that the hash function isn't repeatable between runs. Meaning if you run the program and record a hash of an object, then run it again, they'll be different. This is good for more secure maps and stuff but not good for thinking you can store them to a file and use them later.

kstrauser 1 year ago | |

The hash bit is by design. See https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHA... and http://ocert.org/advisories/ocert-2011-003.html for more.

Neywiny 1 year ago | | |

Correct. But it's not often mentioned until you go looking for it.

Joker_vD 1 year ago |

Of course, this doesn't help with someone (e.g. me) who prefers to get their random numbers by reading them from /dev/random:

    $ strace python3 -c 'with open("/dev/random", "rb") as f: print(f.read(8))'
    [snip-snip]
    openat(AT_FDCWD, "/dev/random", O_RDONLY|O_CLOEXEC) = 3
    newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x8), ...}, AT_EMPTY_PATH) = 0
    ioctl(3, TCGETS, 0x7ffd8198d640)        = -1 EINVAL (Invalid argument)
    lseek(3, 0, SEEK_CUR)                   = 0
    read(3, "\366m@\t5Q9\206\341\316/pXK\266\273~J\27\321:\34\330VL\253L\34\217\264L\373"..., 4096) = 4096
    write(1, "b'\\xf6m@\\t5Q9\\x86'\n", 19b'\xf6m@\t5Q9\x86'
    ) = 19
    close(3)                                = 0

There is also /dev/urandom.

skinner927 1 year ago | |

From kernel 5.18 onwards, /dev/random and /dev/urandom are exactly the same.

Joker_vD 1 year ago | | |

They are still two different filenames (and two different inodes), if you want to intercept openning them.

frakt0x90 1 year ago | |

os.urandom does read from /dev/urandom: https://docs.python.org/3/library/os.html#os.urandom

ijustlovemath 1 year ago |

Cool deepdive into syscalls! We've built a deterministic simulator in Python to test the performance of our medical device under different scenarios, and have handled this problem with a few very simple approaches:

1. Run each simulation in its own process, using eg multiprocessing.Pool

2. Processes receive a specification for the simulation as a simple dictionary, one key of which is "seeds"

3. Seed the global RNGs we use (math.random and np.random) at the start of each simulation

4. For some objects, we seed the state separately from the global seeds, run the random generation, then save the RNG state to restore later so we can have truly independent RNGs

5. Spot check individual simulations by running them twice to ensure they have the same results (1/1000, but this is customizable)

This has worked very well for us so far, and is dead simple.

mianos 1 year ago |

This is utterly insane:

   import os
   os.urandom = lambda n: b'\x00' * n
   import random
   random.randint = lambda a, b: a

I love it!

FreakLegion 1 year ago | |

That's monkey patching, and it actually would've worked fine. There isn't enough context in the write-up to say for sure, but presumably he was just doing it too late, after the third-party library was already imported. At that point the third-party library has its own reference to the original function(s), so patching the reference(s) in the source module doesn't do anything. If the source module had been patched first, though, it all would've worked out.

mianos 1 year ago | | |

I think he was saying something else was calling it and that was busting other things. Gevent did some crazy antics to get the whole tcp interface patched up. https://www.gevent.org/api/gevent.monkey.html#gevent.monkey....

jnwatson 1 year ago | | |

This assumes there are no calls to random functions from C extensions. Still, I would have started with the above.

xeyownt 1 year ago |

Maybe it doesn't fit completely the author needs but an even less intrusive way to control random is to seed it manually.

sltkr 1 year ago |

I know people hate “enterprise”-type software design, but this is a typical case where Dependency Injection would have made the solution trivial without the need for any OS-specific hacks.

And while the article serves as a nice introduction to ptrace(), I think as a solution to the posted problem it's strictly more complicated than just replacing the getrandom() implementation with LD_PRELOAD (which the author also mentions as an option). For reference, that can be done as follows:

    % cat getrandom.c 
    
    #include <string.h>
    #include <sys/types.h>
    
    ssize_t getrandom(void \*buf, size_t buflen, unsigned int flags) {
      memset(buf, 0, buflen);
      return buflen;
    }
    
    % cc getrandom.c -shared -o getrandom.so
    
    % LD_PRELOAD=./getrandom.so python3 -c 'import os; print(os.urandom(8))'
    b'\x00\x00\x00\x00\x00\x00\x00\x00'

Note that these solutions work slightly differently: ptrace() intercepts the getrandom() syscall, but LD_PRELOAD replaces the getrandom() implementation in libc.so (which normally invokes the getrandom() syscall on Linux).

cozzyd 1 year ago |

Rather than writing a program you can also just use gdb and do it interactively...

k_sze 1 year ago |

Just me or the solution will work on anything that depends on the SYS_getrandom syscall?

amnqrt 1 year ago | |

Yes, the solution has nothing to do with Python.

I'm not sure if the problem had anything to do with Python. The article is a bit silent on the specific issue with randomness. If detouring urandom() fixed it, it was probably the randomized hash tables.

It cannot have been third party modules calling random.seed() since that would not have been fixed by the hack (meant positively).

You can say that randomized hash tables by default are a mistake, same as the crippled arbitrary precision arithmetic.

If you write a web service, just set the proper defaults at the start of your program.

healeycodes 1 year ago | |

Yes, it works on any process that makes a SYS_getrandom call.

$ strace python3 -c 'with open("/dev/random", "rb") as f: print(f.read(8))' [snip-snip] openat(AT_FDCWD, "/dev/random", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x8), ...}, AT_EMPTY_PATH) = 0 ioctl(3, TCGETS, 0x7ffd8198d640) = -1 EINVAL (Invalid argument) lseek(3, 0, SEEK_CUR) = 0 read(3, "\366m@\t5Q9\206\341\316/pXK\266\273~J\27\321:\34\330VL\253L\34\217\264L\373"..., 4096) = 4096 write(1, "b'\\xf6m@\\t5Q9\\x86'\n", 19b'\xf6m@\t5Q9\x86' ) = 19 close(3) = 0

% cat getrandom.c #include <string.h> #include <sys/types.h> ssize_t getrandom(void \*buf, size_t buflen, unsigned int flags) { memset(buf, 0, buflen); return buflen; } % cc getrandom.c -shared -o getrandom.so % LD_PRELOAD=./getrandom.so python3 -c 'import os; print(os.urandom(8))' b'\x00\x00\x00\x00\x00\x00\x00\x00'

class Ghost: def __init__(self): self.pos = 5 def move(self): match random.randint(0, 1): case 0: self.pos -= 1 case 1: self.pos += 1

class Ghost: def __init__(self, rng): self.pos = 5 self.rng = rng def move(self): match self.rng.randint(0, 1): case 0: self.pos -= 1 case 1: self.pos += 1