How to pick a random number from 1-10

How to pick a random number from 1-10(torvaney.github.io)

362 points by torvaney 6 years ago | 173 comments

gizmo686 6 years ago |

There is a simpler solution if you do not mind leaving entropy on the table. Break the people up into groups of two without regard for their selection. If two people provide the same number, ignore them. Otherwise, output "0" if the first person's number is smaller then the second, and output "1" if the first person's number is larger.

From this, you have a sequence of uniformly random bits, from which you can construct a uniformly random number between 1 and 10. A simple way (although, again, likely leaving entropy on the table) is to break these bits into groups of 4; interperate them as 4 bit unsigned integers, and discard any result that is not in the range 1-10.

FabHK 6 years ago | |

That works and is perfectly uniform (if the people are i.i.d.), but requires quizzing around 20, 40, 60 or more people for a number before you can deliver one, while the algorithm described only requires one or two.

EDIT: Not quite that many, more like around 9, 18, 27 or so, see below.

jancsika 6 years ago | | |

I'm not sure I understand.

Humans aren't like weighted dice. For example, it's certainly conceivable that a few pairs of these humans misunderstand the goal such that they have a 0% chance of ever choosing numbers that change the respective ordering. Add to that the higher probability that many of these pairs of humans will just toggle their orders each round.

Edit: clarification-- the humans don't even have to misunderstand the rule, just the upshot of the process.

frankmcsherry 6 years ago | | |

Nonsense. It requires roughly 1+log(10) bits in expectation and has a geometric tail.

rbmktechik 6 years ago | | |

Parents algorithm always works, the one described has a massive failure mode - people slowly learn that they are likely to say 7 and self-censor by picking another number. Or maybe in a different culture the distribution changes (say China and the numbers 4, 8).

hcs 6 years ago | |

That's pretty good, as long as each person can't hear any of the previous numbers! The article has the same issue, and might even be biased by the second person knowing they're the second person.

leggomylibro 6 years ago | | |

Maybe have them write their numbers down and put them in a hat? Or hey, this is 2019, have them connect to a website or bluetooth beacon and enter a number on their phone?

Ooh, or I bet you could make a 10-digit keypad that wirelessly reports to a nearby Raspberry Pi for $2-5 in parts, sort of like those 'clickers' that universities use to quiz large lecture classes.

It's too bad that most cheap radio modules are so short-range, because it would be interesting to make something like that and nail them to telephone poles with notes asking people to press a button. You'd have to control for people doing things like hammering the same button repeatedly for a laugh, but it would be fun to see what happened.

Would you be allowed to use HAM bands in the US for that sort of thing if you made them like beacons which sent a callsign after the number value and device ID?

karmakaze 6 years ago | | |

Exactly. Not only aren't people random, they're predictable based on history[0]. Each person could only be asked for one number and they shouldn't even wait perceptibly different lengths of time to be asked.

[0] http://people.ischool.berkeley.edu/~nick/aaronson-oracle/ind...

ssalka 6 years ago | |

Works except when both people in every group give the same number

your-nanny 6 years ago | |

does this algorithm have a name?

suf 6 years ago | | |

https://en.wikipedia.org/wiki/Randomness_extractor#Von_Neuma...

justinpombrio 6 years ago |

...or just add up the answers mod 10.

This has the property that if even a single person answers uniformly at random, then the final number you compute will be uniformly random, regardless of how everyone else answers.

oarabbus_ 6 years ago |

Am I missing something, or is this a textbook case of overfitting? It's a neat project, but at the bottom he prescribes what to do if the person says 1-3, or if they say 4 or 5, etc. Therefore, it's completely overfit to the Reddit data at the top.

leethargo 6 years ago | |

The model could be extended by adding a penalty term to the objective. I guess the penalty would need to be nonlinear, as the current objective already minimizes the x_ij (for i != j) implictly? One could also introduce a fixed-cost offset for any nonzero redistribution amount, but that would change the problem type from linear program (LP) to mixed-integer program (MIP).

duchenne 6 years ago | |

Overfitting typically happens when the number of training samples is small compared to the number of parameters of the model. It especially happens when the input space is high-dimensional.

Here the input space has a only one dimension. The model has 10 parameters. There are 8500 samples. So, the author safely assumes that no over-fitting occurs.

tfha 6 years ago | | |

Wrong. You are assuming a good selection over the whole search space of humans. If you took HN I bet you would get a different distribution. The Reddit data only applies to sampling Redditors sampled using similar strategies, the generalization is limited by how general the collection strategy was

oarabbus_ 6 years ago | | |

This doesn't exclude overfitting. You can take samples n much larger than parameters p in, for example, a political or household earnings survey performed in the San Francisco Bay Area, but obviously one cannot safely assume no overfitting occurs in that scenario.

foota 6 years ago | |

I think the situation given is that you know the skewed input distribution.

codetrotter 6 years ago |

> The easy thing to do is to ask someone “Hey, pick a random number from 1 to 10!”. The person replies “7!”.

Seven factorial is way bigger than 10. Just kidding, sorry.

But what I was actually going to say was, when I read the title of the post, before clicking through, I decided to think of a number myself and I chose the number seven. So it was fun to see that same number in the article.

10% chance in theory and in reality it’s about 25% likely for someone to pick the specific number I did. If only the odds in the lottery were this good.

Why, by the way, is it that seven is so popular?

gkfasdfasdf 6 years ago |

There is a simple way to generate random numbers in your head: https://groups.google.com/forum/#!msg/sci.math/6BIYd0cafQo/U...

quietbritishjim 6 years ago | |

For ease of reference, here is the text at that link verbatim:

Choose a 2-digit number, say 23, your "seed".

Form a new 2-digit number: the 10's digit plus 6 times the units digit.

The example sequence is 23 --> 20 --> 02 --> 12 --> 13 --> 19 --> 55 --> 35 --> ...

and its period is the order of the multiplier, 6, in the group of residues relatively prime to the modulus, 10. (59 in this case).

The "random digits" are the units digits of the 2-digit numbers, ie, 3,0,2,2,3,9,5,... the sequence mod 10. The arithmetic is simple enough to carry out in your head.

This is an example of my "multiply-with-carry" random number generator, and it seems to provide quite satisfactory sequences mod 2^32 or 2^64 , particularly well suited to the way that modern CPU's do integer arithmetic.

You may choose various multipliers and moduli for examples of random selection of the types you ask about.

A description of the multiply-with-carry method is in the postscript file mwc1.ps, included in

The Marsaglia Random Number CDROM with

The DIEHARD Battery of Tests of Randomness,

available at

http://stat.fsu.edu/pub/diehard/

George Marsaglia

Tagbert 6 years ago |

I usually just look at my watch and use the last digit of the seconds value. 0=10

reasonably random

lerax 6 years ago | |

Very easy to predict.

Tagbert 6 years ago | | |

how so? if you ask me for a number from 1-10 and I use my watch, how do you have any predictive information? Even if you looked at your watch there is no reason it should have the same seconds value as mine.

Obviously this is not really something you could automate and if you were to use a clock on an ongoing basis you might be able to predict likely numbers, but that was not in the original scenario.

quickthrower2 6 years ago |

If I had to generate a random number I would do this:

As I ask each person I cycle a secret number through 0-9 in my head. (I increment mod 10 after I ask each person)

When I ask I get their answer, I secretly add my secret number modulo 10 (and consider 0 === 10) and record this in secret.

Then record a tally for each number, the one with the most wins.

This assumes the asked people will not know or guess my internal number. So I seed based on the first person's number (OK they know!) but everyone else wont.

Edit: someone came up with a simpler solution: https://news.ycombinator.com/item?id=20315835

clwk 6 years ago |

How about not using 'pick a random number from 1-10' as the source of entropy? For example, have everyone (from a group of some size) select a natural language sentence of 5 words or more, sum the ASCII values of the upcased alphabetical characters of everyone's sentences, and take the last digit (then add one).

jnordwick 6 years ago |

This looks to be the discrete random variable problem with the alias method:

https://en.wikipedia.org/wiki/Alias_method

but they don't seem to reduce each choice (1-10) to just two options.

simonh 6 years ago |

Put everyone in a numbered sequence unknown to them. Add their chosen number to their sequence number, mod 10.

This re-maps everyone’s choices so they actually have no idea what number they are actually choosing and efficiently redistributes the bias in their choices without massively complicated functions. It is also robust to changes in the distribution pattern. However it would only work well if you had at least 10 people and the number of people is divisible by 10.

bifel 6 years ago | |

Wouldn't putting "everyone in a numbered sequence unknown to them" require a random number, which we don't have?

hanoz 6 years ago | | |

> Wouldn't putting "everyone in a numbered sequence unknown to them" require a random number, which we don't have?

Yes. The above answer just obfuscates the issue. Consider ten people overwhelmingly biased towards 7. The suggested approach in all likelihood gets you 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Then what? You're clearly not much closer to getting a single random number without a random way of picking one of them.

hk__2 6 years ago | | |

> Wouldn't putting "everyone in a numbered sequence unknown to them" require a random number, which we don't have?

No, because that ordering can be chosen by someone not in the sequence.

Fronzie 6 years ago | | |

Assuming you would put them in 10 buckets, that order wouldn't have to be random, as long as people have similar biases.

simonh 6 years ago | | |

The question assumes we have the normal faculties of humans. You can choose whatever non random criteria for ordering them you like as long as it doesn’t affect their number choice.

For example height, or age, or distance from the door to the room, or how much I like them, or where in the sequence I arbitrarily decide to put them in. As long as that does not a priori influence their number choice we’re good.

stOneskull 6 years ago |

How about 1 person labels 10 people 1 thru 10 and another person not knowing how they're labeled chooses 1 of those people

edit: the first person labels the 10 in a jumbled up way, then the 10 people not knowing how they are labeled then jumble themselves up

testplzignore 6 years ago | |

This seems like it could work pretty well. It would be interesting to try it out.

I see two possible sources of bias:

1. The way in which a person labels the others won't be random. And multiple people may exhibit the same pattern. For example, let's say that people are biased towards trying to label an "average" person as #1. This might be based on height, weight, attractiveness, etc. Then a given person may have an uneven distribution of labels attached to them. Further compounding this could be that the most "average" person is the one most likely to be chosen by the second person.

2. You would need to come up with a random way to pair up the first and second person. The worst case might be that the same two people are always paired with each other, and each person always chooses the same person in the room. Then you would end up with a series of 10 digits repeating themselves.

stOneskull 6 years ago | | |

i don't know how strict it is with having no props. i think a blindfold could be acceptable because it can be used from the materials of the people in the room but i don't know about actual labels (maybe post-it notes) or a marker pen. so the first person can't see the people and can only like touch their back to put the label on.

edit: but then it gets to where you may as well just put the numbers in a hat :P

philshem 6 years ago |

Top poker players have mental models for generating random numbers at the table, to avoid predictable play. I can’t find any link, but it doesn’t require a room full of people (or a wristwatch).

RandomBacon 6 years ago |

0-9: Look at the second hand on your watch, what is the last digit?

odd/even or heads/tails: is the seconds odd or even?

peteretep 6 years ago | |

I bet that has sample bias as there’s a small amount of choice in when they reply

the_pwner224 6 years ago | | |

Only if they know the time.

For binary selection, keep the watch and tell them to make an indication (say something) in a few seconds. Note down whether the seconds were even or odd, then hand them the clock and swap roles. Keep the watch hidden until the moment you need to inspect it; record the time as soon as you can see the watch. Merge the two notepads to get the final stream of randoms.

For last digit of seconds, wait a few minutes between each query (making sure they can't see the clock during the rest period). "A few minutes" is imprecisely measured by you, who does not have access to a clock. They delay would foil any attempts at them keeping track of the time in their head. Bonus for distracting them to help with that. And instead of asking them for the last digit, have them flip the watch and then record the time that you [both] first saw.

jhanschoo 6 years ago |

The general method illustrated here (rearranging a sufficiently uniformly distribution over 100 into a very-close-to-uniform distribution over 10) is a special case of a topic in information theory called randomness extraction:

https://cs.haifa.ac.il/~ronen/online_papers/ICALPinvited.pdf

https://people.seas.harvard.edu/~salil/pseudorandomness/extr...

The problem being solved is trying to obtain a distribution arbitrarily close to uniform from sampling a known random distribution.

alexandercrohde 6 years ago |

Maybe an easier solution would be to append all the answers, take a hash (e.g. sha1), then convert the last 2 digits from hex to decimal, and mod 10.

1 liner.

EDIT: I lied, this doesn't work, it biases towards 1-5. I guess you could convert to decimal, and divide by (FFFFFFFFFFFFF / 10)

LeoPanthera 6 years ago | |

"But, let’s say you have to do this without access to coins, computers, radioactive material, or other such access to traditional (pseudo) random number generators. All you have is a room of people."

Can you do a SHA1 in your head?

guan 6 years ago | | |

Not quite in his head, but Ken Shirriff did SHA-256 with pencil and paper:

http://www.righto.com/2014/09/mining-bitcoin-with-pencil-and...

vzaliva 6 years ago | |

I would call this "PHP hacker solution" :)

alexandercrohde 6 years ago | | |

Yeah, there's a few types of "great solutions" in engineering. There are those that run really fast (big O) and those that can be written really fast.

99% of the time the job calls for engineers who can come up with the second type of solution.

FabHK 6 years ago | |

Would be slightly biased, though (256 is not divisible by 10, you'd have to reject if >=250, say).

EDIT: and don't forget to add 1.

AlphaWeaver 6 years ago |

A few years ago I was stumped by a similar question about generating uniformly random numbers with less than ideal constraints. The helpful minds at MathOverflow solved it [0] but it wasn't something that you could guarantee would always work. (The question explains in more detail... Due to the miniscule probability of a never ending sequence of less than ideal random integers.)

I'm curious whether this technique could be applied to my original problem!

[0]: https://math.stackexchange.com/q/1273214/196899

yarg 6 years ago |

You could use the biased distribution to build a lookup table for a less biased distribution.

Build a 10^n lookup table with an entry for each of the possible (ordered) n-tuples of values, give each entry a weight of the product of the probabilities of each of the n values making the entry.

Create a set of probabilities for each output and initialise to zero, run through each of the generated tuples in descending order of weight, set the lookup value to the output with the current lowest probability (or the first, or feed in another PRNG - but you have to stop somewhere) and increase the probability according to the weight associated with the tuple.

At the end of this process you'll have a PRNG that's at least as good as the input (and generally better) - although you'll need to query the seed PRNG n times per result. The higher the value of n the better the output (Although the lookup table will become quite large).

yarg 6 years ago | |

This solution gives no consideration to security only the distribution of outputs.

firebatpi 6 years ago |

There's no need to solve the distribution balancing problem with linear programming. You can just use a greedy algorithm where you repeatedly give probability mass from numbers with more than 10% to numbers with less.

ajuc 6 years ago |

In similar situation (we were playing RPG while on a trip, and we needed to "throw dice") - we simply had the DM count silently and the player that was "throwing" say "STOP".

It was pretty much uniform.

kstenerud 6 years ago |

Or have them give the last digit of the age of their oldest living family member.

ridaj 6 years ago | |

Due to the distribution of people's age towards the "old" end of the scale, I'd guess that this is more likely to be 0 than 9?

roberto 6 years ago | | |

Yeah, I'm curious to look at data, but I think this will be skewed towards lower digits.

roberto 6 years ago | |

What about Benford's Law?

harryh 6 years ago | | |

Benford's concerns the first digit, not the last digit.

kevmo314 6 years ago |

> Ideally we want to preserve as much of the initial distribution (i.e. do as little chopping and changing) as possible.

I was thinking through the post that this sounds like a straightforward problem and it is: https://stackoverflow.com/a/5953133/86433

Except the introduction of an "ideally" optimization condition turns it from a straightforward transform into something requiring a linear programming solver. I wonder if the resultant algorithm is actually any simpler though...

jeromebaek 6 years ago |

The interesting part is the recursive application of the algorithm. I'm wondering why it isn't applied fully recursively but only up to one step. Surely, if the algorithm can generate a uniform probability distribution, it can also generate an arbitrary probability distribution, so why not use that arbitrary probability distribution in order to generate the uniform probability distribution?

testplzignore 6 years ago |

If you have a pair of scissors or are willing to ask people to take a brief bit of pain...

Hair.

Cut (or pull) off a chunk of hair from everyone in the room. Put in a big pile. You'll have hundreds of thousands of hairs. It will be a glorious mess :)

Now ask people in the room to grab a handful of the hairs and count them. Take the last digit of the each count - there's your random number. You could maybe get multiple digits from a single handful - just be aware of Benford's law.

benj111 6 years ago |

So this relies on that being a somewhat normal distribution (7 being the most common number).

What if the people were primed by the first person saying 7? What if the first person had said 3?

So really you'd need to get your 'random' numbers, check the distribution, than redistribute your 'random' numbers. Which doesn't seem random or 'random' to me.

lurquer 6 years ago |

The 8500 students are numbered off (assigned id's) from 1 to 10. Then you wait. Your first random number is the ID of the first person who dies. Second random is the ID of second person to die. Etc.

Fairly slow algorithm. And only good for 8500 random numbers. But I think it would give a good distribution.

rland 6 years ago |

I actually laughed out loud when I saw the image of the human RNG distributions. 7 is a random number. Who knew?

jldugger 6 years ago | |

I feel like you could probably 'fix' it by the following protocol:

1. Ask your participant to write down a random number. 2. After they've written that number down, inform them you will guess at their number, and give them a dollar if you guess wrong. 3. Ask if they'd like to change their random number. 4. Allow them to write down their new number.

The before/after would probably change the distribution, and pretty much demonstrate that people can be more random when motivated.

gmac 6 years ago | | |

I'm sure the distribution would change, but not sure it would be closer to uniform. It's now a completely different problem: choose a number most resistant to guessing. For example, if the participant hypothesises that your guess will be a "random" number chosen by you, she should always pick 10.

seba_dos1 6 years ago | |

7 was what came into my mind right after reading the title. Seeing it later in the text felt like a magic trick.

foota 6 years ago |

If you can widen the answers the people can give you could first determine the distribution of a larger width (i.e., pick a number from 1 to 100) then map those to the 1-10 distribution based on their frequency.

This is effectively the same as the person proposing sampling tuples.

yyhhsj0521 6 years ago |

This can be generalized to having any distribution, simulating a die:

http://www.keithschwarz.com/darts-dice-coins/

sokoloff 6 years ago |

Survey for analysis on this thread:

https://www.surveymonkey.com/r/XWML9JM

sokoloff 6 years ago | |

Data (including a batch I paid for on Mechanical Turk):

https://docs.google.com/spreadsheets/d/1Dh0wiTCRkBhckWGXtjZg...

(Amusingly, "69" is an over-represented response...)

ngoel36 6 years ago |

This assumes independent events. If someone picks 7 and you ask them for another random number, it is highly unlikely (certainly not 28.1%) that they will pick 7 again.

harryh 6 years ago | |

"Ask another person for a random number"

solotronics 6 years ago |

So you need access to a uniform RNG to implement this as it takes the human bias and multiplies by a factor and the RNG... so why not just use the uniform RNG?

shhsshs 6 years ago | |

You do not need access to a uniform RNG for this - he uses more humans to get the redistribution chance.

wbirthy 6 years ago |

You could use the month of birthday as random source，if it is bigger than 10，then ignore.

disconnected 6 years ago |

Use the middle square method:

https://en.wikipedia.org/wiki/Middle-square_method

It can be trivially calculated with pencil and a paper and will be random enough. You just need to pick a seed and off you go.

If it was good enough for Von Neumann it is good enough for you.

jjgomo33 6 years ago |

Didn't the world agreed already that the new Pseudocode Language was Python?

bubblewrap 6 years ago |

What number are people most likely to pick the second time if you ask them twice?

pishpash 6 years ago | |

Maybe the most likely number of the ones they haven't picked? So 5 if you picked 7 and 7 if you picked anything else. This should be tested.

rajacombinator 6 years ago |

Or just carry a coin-like flippable object. (Cell phone?)

testplzignore 6 years ago | |

And if no objects are allowed in the room at all, then pick the lightest person and flip them :) Though flipping a person multiple times is difficult, so it could be biased towards 1 flip.

umvi 6 years ago |

Or just use digits of pi; most people have at least 10 digits memorized, some have 20+ memorized. There's also a formula you can memorize for calculating the nth digit of pi, so you have infinite random numbers without memorizing an esoteric algorithm with a bazillion corner cases.

That said, I still thought this was an interesting article and I loved the animated bar graphs.

andrewbarba 6 years ago | |

Most people have at least 10 digits memorized? That is one bold claim

umvi 6 years ago | | |

I meant here on Hacker News. I would guess most of us here played with our TI-83s enough to know the first 10

celticninja 6 years ago | |

I'm not sure that most people have memorized the first 10 digits of Pi. I know 3 for certain, any more than that and I can look them up.

stOneskull 6 years ago | | |

I think most would know the first 5 because it's pretty accurate making 3.14159 into 3.1416 and you forget what's after the 9

spookthesunset 6 years ago | |

That is pretty arrogant to think that just because somebody posts here they have 10 digits of pi memorized. This is a public forum. Anybody can post. You and everybody else posting here are in all likelihood just average intelligence.

There is nothing exclusive about using this forum. You aren’t the first or the last random person to exist here.

Don’t let your own supposition of your intelligence get to your head. There is even a theory for that.... Dunning Kruger.

umvi 6 years ago | | |

Memorizing digits of pi is easier than memorizing the algorithm in the article. 3.14159265, come on, stop making such a big deal of this.

arkadiyt 6 years ago | |

It's still an unsolved problem whether or not the digits of pi are uniformly distributed.

umvi 6 years ago | | |

Just try it out for yourself. Analyze the distribution of the first billion digits and I think you'll find they will produce a more uniform distribution than this algorithm in the article given 1 billion human responses...

billysielu 6 years ago |

It's not random if you're forcing it to be uniform. May as well skip the math and do this instead:

Divide all the people into groups of 10. Have the members of a group play each other at whatever, e.g. arm wrestling, to produce a rank from best to worst. Their rank becomes their answer.

Now everyone will reply with a number as close to uniform as the total number of people is divisible by the number of choices.

Pr(X + S mod 10 = i) = \sum_j Pr((X + j) mod 10 = i | S = j) Pr(S = j) = \sum_j Pr(X = i | S = j) Pr(S = j) = \sum_j Pr(X = i) Pr(S = j) = Pr(X = i) \sum_j Pr(S = j) = Pr(X = i)