How to generate uniformly random points on n-spheres and in n-balls(extremelearning.com.au) |
How to generate uniformly random points on n-spheres and in n-balls(extremelearning.com.au) |
1. Truncate your lat longs to some arbitrary decimal place (this is very very stupid, you end up with grid lines [1])
2. The above method ^^ but everyone tries basically doing like random angle + random length along angle, which doesn't generate uniform results in a circle as the article mentions[2]. So then you try generating points in a box that encloses your circle, and rejecting anything outside the circle, but that smells bad. So you do some googling and find method 6 listed in the article (Good! and Fast!)
3. Realize that fuzzing points is stupid, what you really want is to view points in aggregate anyways, so you try heat maps
[1]: Ok you always end up with grid lines, but truncating to like 1-6 decimal places produces very obvious grid lines to the human eye
[2]: Try this in pyplot! You'll see right away with ~100 points its not uniform in the way you expect
https://extremelearning.com.au/how-to-evenly-distribute-poin...
It might fail for higher dimensions, but lots of programs only run on a 3D sphere of a planet, haha!
I don't actually know of any useful algorithms with worse asymptomatic behaviour.
Fortunately, I never used it for anything, because I made the classic naive mistake of simply choosing a random theta in 0,2pi and phi in -pi,pi, which ends up with points biased towards the poles.
Somehow *12 years later* my subconscious flagged it up and I woke up in the middle of the night realizing the issue. Even though I'd never revisited it since then!
https://github.com/dmd/thesis/commit/bff319690188a62a79821aa...
I don't think it's a good idea to introduce over 20 different methods before talking about the correct one that works for any number of dimensions, say n, and the reason behind its correctness is very obvious:
* Generate a random vector by sampling n standard normal distributions: `vector = np.random.randn(n)`
* Key step: show that the vector has a uniform direction.
The proof is as follows: you look at the probability density function of a normal distribution, which is `p(x)=1/sqrt(2pi)*exp(-x^2/2)`, and the probability density function of the vector is the product of all these densities of its individual dimensions. Now the product `p(x1)p(x2)...p(xn)=1/sqrt(2pi)^n*exp(-(x1^2+x2^2+...+xn^2)/2)` since `exp(a)exp(b)=exp(a+b)` due to power functions.
Now it's easy to see that probability density is invariant to vector length, which means the vector has uniform probability for any specific value of x1^2+x2^2+...+xn^2. Whatever rotation you apply after sampling this vector, since rotation preserves x1^2+x2^2+...+xn^2 by definition, you get the exact same probability density function and therefore the same distribution of vectors.
* Now that the direction of the vector is uniformly sampled, decide the radius separately: for n-sphere the radius is just 1, and for n-ball the volume of the radius is proportional to the nth power of the radius, so you sample a uniform number from [0,1] as the volume and take the nth root as the radius: `radius = np.random.uniform(0, 1)*(1/n)`
* Normalize the vector to the radius: `vector = vector / np.sqrt((vector*2).sum()) * radius`
I think Section 2 of this book provides a much better perspective on the problem of generating uniform random points, since it also provides intuition behind the geometry of high dimensions and properties of the unit ball, etc.
Tao, T., Vu, V., and Krishnapur, M. (2010) Random matrices: universality of ESDs and the circular law. The Annals of Probability. 38(5) 2023-2065.
If the circle represents the area in which a simulated projectile will hit, you probably don't want a truly random distribution of points but instead have a bias towards the middle of the circle. A real gun shot multiple times at a fixed target will probably (assuming perfect aim but some variation on every shot) have more shots hit the middle of the pattern than the edges.
Some early Valve shot code actually had a purely random distribution, but at some point an alternate version got written and shared at https://developer.valvesoftware.com/wiki/CShotManipulator
Ironically the biased version is based on a pretty simple method that in fact people sometimes get wrong when they want a truly random point distribution in a circle. Just doing a random radius and theta will lead to a biased distribution. Wolfram Mathworld has a good writeup on it at https://mathworld.wolfram.com/DiskPointPicking.html
> purely random distribution
Nitpick: you don’t mean random; you mean uniform.
I think the best illustration of "reasonable choices of what random values should be used leading to biased results" is Bertrand's paradox which I was introduced to via numberphile/3blue1brown: https://www.youtube.com/watch?v=mZBwsm6B280 and am just glad that nothing I have ever needed random sampling for has ever been important :D
[1] please don't use this blindly, I'm really just going off very old recollection, if you need it google random sphere sampling :D
Currently, I am porting my codebase to Rust and did that part over the weekend, so if anyone is interested in this exact implementation, I'd willing to share it (as a crate if necessary).
If I roll a 12-sided die, that's random. If I roll two 6-sided dice and add the result, that's also random, but it has a different distribution of values.
The two dice verison, there's one way to get a result of 2, but _several_ ways to get 7. You'll get 7 way more often than you'll get 2.
The one-die version each outcome is equally likely. You're exactly as likely to get 2 as you are to get 7 or any other value in the range of possibilities.
The one-die version is a uniform distribution. The two-dice version is not uniform.
https://extremelearning.com.au/how-to-evenly-distribute-poin...
Am I right in thinking that it’s because circles of constant theta on the sphere contain the same ‘expected’ number of points (since phi is uniformly distributed), and these circles get smaller (and in the limit, shrink to a point) as one travels towards the poles — hence, higher density of points…?
…makes me realise that spherical polar coordinates, as natural as they seem, aren’t in some way homogeneous in the sense that whilst the mapping (theta, phi) —> (x, y, z) is continuous, it isn’t uniformly so… or something like that. I don’t know enough (possibly differential) geometry to articulate it properly, but it feels like the problem of getting genuinely uniform sampling from a sphere (or indeed any manifold) is somewhat equivalent to having a ‘nice’ coordinate system (one that respects the symmetry of the sphere). Basically, polar coordinates are prejudiced and treat points differently depending on how close to the poles they are.
By definition, our sampling in the domain space (two intervals) is uniform; the problem comes when we project through a coordinate system that doesn’t respect this.
Which better solution did you use? I’m having trouble reading your code on my current device.
The term is "measure preserving". The mapping is continuous but it doesn't preserve the length of intervals projected through it.
I didn't, since I haven't touched or needed this code since 2003. I just put a warning saying don't use it.
https://numpy.org/doc/stable/reference/random/generated/nump...
Or in god's own words (TAOCP section 3.4):
Applications of random numbers often call for other kinds of distributions, however; for example, if we want to make a random choice from among k alternatives, we want a random integer between 1 and k. If some simulation process calls for a random waiting time between occurrences of independent events, a random number with the exponential distribution is desired. Sometimes we don't even want random numbers — we want a random permutation (a random arrangement of n objects) or a random combination (a random choice of k objects from a collection of n).
In principle, any of these other random quantities can be obtained from the uniform deviates U0, U1, U2, ...; people have devised a number of important "random tricks" for the efficient transformation of uniform deviates. A study of these techniques also gives us insight into the proper use of random numbers in any Monte Carlo application.
The easiest example most people are aware of in practice is throwing two 6 sided dice vs one 12 sided dice. The outcome of both is random, but people seem to know fairly intuitively that the two dice case produces is more likely to produce middle values than extreme ones, while the single 12 sided dice doesn't have that behavior.
My reasoning is, if you are counting cards in a game, you are giving yourself an advantage. But what really happens is that from your point of view cards will be drawn less and less uniformly randomly. Or put in another way, if you know the distribution is normal, you can bet on the the result being near the center and come out on top, but if it’s uniformly random distribution all hopes are out.
But what if the domain isn't finite? e.g. the entire real line? Then we can't define a uniform distribution, because the total probability can never sum to one. We can still define a maximum entropy distribution, but only by making some more assumptions. The normal distribution is the maximum entropy distribution on the real line, given a particular mean and variance.
Other examples: on the positive real line, the exponential distribution is maximum-entropy for a given mean. Poisson distribution is the equivalent on the non-negative integers.
The problem is if you are doing something that requires uniform random values, and your random source is not uniform, then things will go wrong.
Rolling two dice gives you a number which is random in [2, 12], but not uniform -- 7 is far more likely than 2 or 12.
Finite support somehow seems less random because you can still say something interesting about its mean and bounds in the same way you can say something about the mean and variance of a normal distribution.
That’s just because the payout distribution and real distribution are different, and there’s a place the payout is at a loss.
If you bet on a uniform distribution of 2 through 12 that payed out according to a normal distribution, you’d make a killing betting on 2 or 12 (the “outliers”). If you roll 2d6 (normalish) and get paid according to a uniform distribution, you’ll make a killing betting on 7.
(N denotes the number of points you need to generate)
Even more so if you’ve gone 50 picks like that, and are now down to 49 red balls and one blue ball.
That is what card counting helps you with - knowing the odds based on your current state, as compared to the initial state.
And the last card was still random, btw. But can be guessed now with perfect precision due to the process of elimination.
Random at shuffle doesn’t mean unguessable or unpredictable as the game goes on.
Most maps are 2-dimensional and this is fine.
The method you propose is not used in high dimensions in practice as it would involve evaluating order n^2 trigonometric functions and is also far harder to implement than the methods discussed in the article.
Actually, it is an overstatement I think, or something orthogonal; the algorithm is still correct just wildly inefficient. Fail should be reserved for algorithms that produce the wrong result, not ones that produce the right result very slowly, however slowly.
Note: "asymptomatic" does not mean the same as "asymptotic"
I disagree the last card is "still" random, at least in the sense of statistics. It "was" random up until the measure of uncertainty of the next event reaches 0 (i.e. entropy reaches 0). At that point it's no longer a "guess", there is no uncertainty and the remaining pattern is always 100% predictable in that regardless which proceeding events occured to get there it can always be known what the next value is without uncertainty. Since there ceases to be any uncertainty in what the remaining pattern will be there ceases to be randomness in the next value generated. That the card's value was not known at n=0 does not affect whether the n+1 card still is/isn't random when n=51. In another form, that you didn't previously know the value of card n=52 with past information holds no influence whether the value is random or not with new information. Statistical randomness is all about what you know of the future predictability, not about how something came about.
This is also true of events which fall into predictable patterns at any point along the path. E.g. if I had a (relatively useless) hardware random number generator that generated random numbers 0-127 once per second until it generated a 0 at n=17, at which point it ceased being able to pull randomness from it's dead circuits and always produced 0 afterward, the first 17 values were all statistically random at the time of their draw but n=[18,inf) are all now predictable and no longer random from that point on.
Knowing the outcome of a die roll after the roll doesn’t make the roll of the die non-random.
Any more than the die coming up 6 a bunch of times in a row does. Though maybe it would be a good idea to check the weight/balance of it, hah.
Knowing what the last card has to be after all the other cards have been played doesn’t make the process which made that card the last card any less random either.
Picking cards off a deck isn’t a randomization/non-randomization event, shuffling the deck is what does that.
Probability however, which directly impacts gameplay in some kinds of games, is directly impacted by knowledge exposed during gameplay.
So for instance, the second to last card (and also the last card) can be known with about 50/50 odds (probability wise) if someone is good at counting, which is damn good! Way better than 1/2704 odds of guessing a sequence of two cards at the start of the deck.
In the example of the broken hardware random number generator, knowledge of the defect can be used to attack a system if it assumes a continuous probability distribution out it’s outputs, if the attacker knows this.
The same as an attacker at a casino could probably manipulate a game to make money if they knew something was broken in the deck shuffling and the same two cards were always at some position in a ‘new’ deck.
Exactly, same page then. The card not being random at the end of game frame of reference places no limitation that it was random from the beginning of the game frame of reference. In the end of game reference the last card is never random, it's only ever the remaining card. In the beginning game reference which card will end up being the last card is still random at that point despite it ceasing to be from the later frame.