I recently found a super simple algorithm that appears to produce a number in the interval [0,N] with a branchless expression with a single multiplication in an extended number size. (Sorry I don't have a reference.)
Say you want to generate a number, G, in interval [0,N] where N<=UInt32Max. The algorithm is:
G = uint32( uint64(N)*uint64(rand.UInt32())>>32 )
It seems like this should select a number in the range with no bias. Is there something I missed?Ps: it looks like your function is exclusive like [0,N) not [0,N] Also your function is described in this blog post https://www.pcg-random.org/posts/bounded-rands.html
Yes. There are many values of N that aren’t divisors of UInt32Max.
As the article says: “However, no algorithm can convert 2⁶³ equally likely values into n equally likely values unless 2⁶³ is a multiple of n: otherwise some outputs will necessarily happen more often than others. (As a simpler example, try converting 4 equally likely values into 3.)”
https://go.dev/play/p/IeJQEAclBCU
Edit: maybe this shows the bias better: https://go.dev/play/p/3eKJibIlF1a
No, but you can convert a RNG that emits 4 equally likely values into an RNG that emits 3 equally likely values. Just - anytime the RNG returns 4, try again.
Here's a fun puzzle / annoying interview question: You have a biased coin. You can flip it as often as you want, but heads and tails are not equally likely. Without figuring out the bias of the coin, how do you produce purely random bits?
My favorite is https://research.swtch.com/qart (see also: https://spinroot.com/pico/pjw.html)
The end of the post they mention that an encoding/json/v2 package is in the works: https://github.com/golang/go/discussions/63397
This is one of the reasons I love working in Go. I feel that the language maintainers understand that people using Go have more important work to do than update their code to whatever the new hotness is this month.
Basically the opposite of this: https://steve-yegge.medium.com/dear-google-cloud-your-deprec...
And even more ideally, as many v1 usages should be automatically fixed as possible by `go fix` or similar tools. Allowing this to all user packages would be a major improvement over the status quo.
We have plans to get there. https://github.com/golang/go/issues/32816
I think a lot of the benefit of putting stuff in the standard library is interoperability. It seems obvious but - having a string and list type in std means you can pass strings and lists between packages. I think a lot of standard stuff that acts as "glue" should be in std. For example, in nodejs the standard library includes HTTP request and response types because of how useful they are in their ecosystem.
Notably, unlike swift, rust doesn't have an "inlined string" type in std. There's a lot of crates that implement small strings, but most interfaces that need to pass an actual string buffer use std::String - and thats way less efficient. (Thankfully, &str is more common at interface boundaries). Rust also doesn't have much support for futures in std - which upsets a lot of people, because tokio ends up being included by a lot of programs.
Anyway, when it comes to crates like rand where interoperability isn't important, I think its fine to keep this stuff out of std. Code can evolve much more easily when it lives as a 3rd party library.
What happened to batteries-included support?
People have no issue using third party libraries in other languages to get more done than the OOTB libraries.
Regardless, with small values of N, the bias is very slight so you would need many many iterations to see the imperfection in a statically significant way.
A quick search didn't reveal any good resources for how to test the quality of a random number generator in a number range. Is what I came up with the best strategy, and you just need to run it for much longer (and compare to a known-good implementation) to see the difference?
Flip twice. If both flips are the same discard the result. Output 0 for TH, 1 for HT.
The followup is this: That approach only uses about 50% of your coin flips. The other 50% are discarded. How would you improve the efficiency?
At a high level I’d probably try and exploit the fact that every bit sequence with a given number of H and T has equal probability. e.g., HHHT HHTH HTHH THHH are equally probable and so can be mapped to four different values. That still only gets me 2 bits (50%) but other combinations (e.g., variations on HHTT) could get me log2(6) bits. I’m guessing with a higher number of flips I could extract (on average) more and more as a proportion. No clue what the asymptote would be.
https://learn.microsoft.com/en-us/dotnet/core/project-sdk/ov...
https://www.nuget.org/profiles/aspnet
Some projects listed were last updated in 2022.
According to Wolfram Alpha this works as N gets larger but it’s not great. For 16 flips you get 9.5 bits of entropy, but hey at least I beat half! 32 flips gets you about 20 bits of entropy. By 64 flips you get 43 bits, and that’s approaching 2/3 efficiency. Maybe not so bad after all!
Going higher is a little tough since I’m on mobile but it starts crawling reaching only 71% efficiency by 1024 flips. I’m curious if it does actually asymptotically reach 100% efficiency (for a fair coin), even if quite slowly.
Edit: Playing more[1] it really seems to approach 72.1. I wonder if I can figure out the asymptote analytically…
[1] https://www.wolframalpha.com/input?i=sum+from+i=1+to+N-1+of+...
Unfortunately, Wolfram Alpha isn't able to determine the limit of this function[2], and neither am I. :)
[1] https://www.wolframalpha.com/input?i2d=true&i=evaluate+Divid...
[2] https://www.wolframalpha.com/input?i2d=true&i=Limit%5BDivide...