How to generate uniformly random points on n-spheres and in n-balls

How to generate uniformly random points on n-spheres and in n-balls(extremelearning.com.au)

168 points by egorpv 2 years ago | 81 comments

I actually needed this at work once! We needed to fuzz peoples address in a mapped view for analytics, without revealing PII. It ended up never being shipped, but we needed to fuzz geographic data and the thinking was like:

1. Truncate your lat longs to some arbitrary decimal place (this is very very stupid, you end up with grid lines [1])

2. The above method ^^ but everyone tries basically doing like random angle + random length along angle, which doesn't generate uniform results in a circle as the article mentions[2]. So then you try generating points in a box that encloses your circle, and rejecting anything outside the circle, but that smells bad. So you do some googling and find method 6 listed in the article (Good! and Fast!)

3. Realize that fuzzing points is stupid, what you really want is to view points in aggregate anyways, so you try heat maps

[1]: Ok you always end up with grid lines, but truncating to like 1-6 decimal places produces very obvious grid lines to the human eye

[2]: Try this in pyplot! You'll see right away with ~100 points its not uniform in the way you expect

bagels 2 years ago | |

I needed something related, n roughly evenly distributed points on the surface of a sphere. Ended up using a Fibonacci spiral.

https://extremelearning.com.au/how-to-evenly-distribute-poin...

milleramp 2 years ago | | |

Hey me too, very interesting problem. I also used this method as well. https://www.sciencedirect.com/science/article/abs/pii/S00104...

alanbernstein 2 years ago | | |

Force directed layout is my favorite for this.

bee_rider 2 years ago | |

The rejection method smells pretty good to me, in the sense that it should be pretty obvious to anybody with, like, middle school level math I think (right?).

It might fail for higher dimensions, but lots of programs only run on a 3D sphere of a planet, haha!

contravariant 2 years ago | | |

Fail is an understatement, the ratio between the two volumes is basically (n/2)!(4/pi)^(n/2). Which is also the expected number of tries you'll need. The time doesn't merely grow exponentially it grows faster than exponential.

I don't actually know of any useful algorithms with worse asymptomatic behaviour.

esafak 2 years ago | |

I'd just have used Uber's hexagonal tiling library. https://www.uber.com/blog/h3/

frankfrank13 2 years ago | | |

This is what I landed on! It actually works very very well for aggregate data.

longhaul 2 years ago | |

Long time ago, I had the same problem while ray tracing using Monte Carlo techniques. Using Mersenne Twister fixed the clustering and grid like randomization. https://en.m.wikipedia.org/wiki/Mersenne_Twister

chpatrick 2 years ago | |

What if someone lives in a really remote location so they're the only ones in the heat map cell?

nighthawk454 2 years ago | | |

I should think after defining a uniform distribution of points, you could cluster them as needed to form larger lower-resolution chunks according to population size. Binning everyone in that region to say the central most point. Which could then be adaptive as the populations change.

frankfrank13 2 years ago | | |

Another user here mentioned Uber's h3, which is actually what I used. You end up being able to anonymize over arbitrarily large geographic areas using something like a tile, rather than a point

nick7376182 2 years ago | | |

Assign everyone a random offset, that doesn't change, that is large enough to obscure the address with some reasonable radius but small enough to not drastically skew the heat map at a lower zoom level

dmd 2 years ago |

As an RA, before starting grad school, I hacked together some code to choose a random point on a sphere, for some psychophysics experiment I was doing.

Fortunately, I never used it for anything, because I made the classic naive mistake of simply choosing a random theta in 0,2pi and phi in -pi,pi, which ends up with points biased towards the poles.

Somehow *12 years later* my subconscious flagged it up and I woke up in the middle of the night realizing the issue. Even though I'd never revisited it since then!

https://github.com/dmd/thesis/commit/bff319690188a62a79821aa...

renonce 2 years ago |

I learned the easiest way to do d-dimension sampling in Foundations of Data Science: see https://news.ycombinator.com/item?id=34575637 or https://www.cs.cornell.edu/jeh/book.pdf?file=book.pdf

I don't think it's a good idea to introduce over 20 different methods before talking about the correct one that works for any number of dimensions, say n, and the reason behind its correctness is very obvious:

* Generate a random vector by sampling n standard normal distributions: `vector = np.random.randn(n)`

* Key step: show that the vector has a uniform direction.

The proof is as follows: you look at the probability density function of a normal distribution, which is `p(x)=1/sqrt(2pi)*exp(-x^2/2)`, and the probability density function of the vector is the product of all these densities of its individual dimensions. Now the product `p(x1)p(x2)...p(xn)=1/sqrt(2pi)^n*exp(-(x1^2+x2^2+...+xn^2)/2)` since `exp(a)exp(b)=exp(a+b)` due to power functions.

Now it's easy to see that probability density is invariant to vector length, which means the vector has uniform probability for any specific value of x1^2+x2^2+...+xn^2. Whatever rotation you apply after sampling this vector, since rotation preserves x1^2+x2^2+...+xn^2 by definition, you get the exact same probability density function and therefore the same distribution of vectors.

* Now that the direction of the vector is uniformly sampled, decide the radius separately: for n-sphere the radius is just 1, and for n-ball the volume of the radius is proportional to the nth power of the radius, so you sample a uniform number from [0,1] as the volume and take the nth root as the radius: `radius = np.random.uniform(0, 1)*(1/n)`

* Normalize the vector to the radius: `vector = vector / np.sqrt((vector*2).sum()) * radius`

I think Section 2 of this book provides a much better perspective on the problem of generating uniform random points, since it also provides intuition behind the geometry of high dimensions and properties of the unit ball, etc.

0-_-0 2 years ago | |

I can't believe I didn't already know this

FooBarBizBazz 2 years ago | |

This is the way.

porphyra 2 years ago |

Another way to generate uniformly random points on a 2D disk that the author forgot to mention: let A be an n x n complex matrix whose elements are iid copies of a fixed random variable with unit variance. Let lambda_i be its ith eigenvalue, and let x_i = 1/sqrt(n) real(lambda_i) and y_i = 1/sqrt(n) imag(lambda_i). As n approaches infinity, the distribution of x, y approaches almost certainly to the uniform distribution over the unit disk.

Tao, T., Vu, V., and Krishnapur, M. (2010) Random matrices: universality of ESDs and the circular law. The Annals of Probability. 38(5) 2023-2065.

kurlberg 2 years ago | |

It's very inefficient, both on terms of runtime and in terms wasted entropy.

porphyra 2 years ago | | |

Indeed, haha.

montefischer 2 years ago |

If you're interested in things like this, you might like to read the paper of Diaconis, Holmes, and Shashahani on sampling from a manifold. https://arxiv.org/abs/1206.6913

pugworthy 2 years ago |

2-ball (disk) distribution is kind of interesting in gaming when simulating gun projectile hit locations.

If the circle represents the area in which a simulated projectile will hit, you probably don't want a truly random distribution of points but instead have a bias towards the middle of the circle. A real gun shot multiple times at a fixed target will probably (assuming perfect aim but some variation on every shot) have more shots hit the middle of the pattern than the edges.

Some early Valve shot code actually had a purely random distribution, but at some point an alternate version got written and shared at https://developer.valvesoftware.com/wiki/CShotManipulator

Ironically the biased version is based on a pretty simple method that in fact people sometimes get wrong when they want a truly random point distribution in a circle. Just doing a random radius and theta will lead to a biased distribution. Wolfram Mathworld has a good writeup on it at https://mathworld.wolfram.com/DiskPointPicking.html

xanderlewis 2 years ago | |

> truly random distribution

> purely random distribution

Nitpick: you don’t mean random; you mean uniform.

pugworthy 2 years ago | | |

Game players live and die (virtually) by the fabled RNG and pray to RNGesus even. So though mathematically uniform is the right word, gamer-wise, random is a pretty good word to use.

olliej 2 years ago |

I was always surprised at how easily you get biased sampling when generating random points despite all input - something I often saw students do was essentially normalize({2rand() - 1, 2rand() - 1, 2rand() - 1}) or variations of that (where rand() is "good" not the literal rand(3)), and there are numerous other ways that are more subtly wrong. IIRC the nominally correct way for a sphere specifically is something like a=rand() b=rand() and the random point is something* like { cos(a)sin(b), sin(b), cos(a)cos(b) }[1]

I think the best illustration of "reasonable choices of what random values should be used leading to biased results" is Bertrand's paradox which I was introduced to via numberphile/3blue1brown: https://www.youtube.com/watch?v=mZBwsm6B280 and am just glad that nothing I have ever needed random sampling for has ever been important :D

[1] please don't use this blindly, I'm really just going off very old recollection, if you need it google random sphere sampling :D

hinkley 2 years ago | |

I used to think “normal distributions are everywhere” but the more math and science I watch on YouTube the more the central limit theorem pops up. It’s the CLT that’s everywhere, it just brings normal distribution as it’s +1.

richrichie 2 years ago | | |

Indeed. The fundamental insight behind CLT i.e. sample average is normally distributed even when population distribution is not normal is intuitive, yet the the theorem is magical.

enthdegree 2 years ago |

I find it hard to believe the dropped coordinates approaches were first noticed in 2010 and proven in 2017.

atum47 2 years ago |

Some years ago I saw a blue and red image that caught my attention. The red dots seemed to be floating around the blue ones. I later found out that has a name - Chromostereopsis [1]. So I decided to make my own image and needed to distribute some points in a circle, this how I did it [2].

[1] - https://en.wikipedia.org/wiki/Chromostereopsis

[2] - https://jsfiddle.net/victorqribeiro/vxf2ajzm/48/

stefanka 2 years ago |

I encountered the need for uniform random points on hyperspheres, and found this solution (with Python code) very helpful: https://stackoverflow.com/a/59279721.

Currently, I am porting my codebase to Rust and did that part over the weekend, so if anyone is interested in this exact implementation, I'd willing to share it (as a crate if necessary).

frumiousirc 2 years ago | |

The linked SO question is not about uniform random points. The poster explicitly excludes answers involving uniform random distribution on a hypersphere.

ykonstant 2 years ago |

I wonder if CS people have studied the method of Lubotzky, Phillips and Sarnak to produce well-distributed points on spheres: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.31603907...

kloch 2 years ago |

I find it amusing that you need a normal distribution of scalars to generate a uniform distribution of vectors on an n-sphere.

ackbar03 2 years ago |

Is this the same as uniform SO(N) sampling? I had to do something similar like this for some deep learning training method I was working on. I gave up

xLaszlo 2 years ago |

If you are looking for low discrepancy (fills the space "smoothly") quasy random numbers check out Sobol and Niederreiter sequences.

IshKebab 2 years ago | |

No don't. Look at plastic numbers instead. They are much better.

tiffanyh 2 years ago |

Off topic: isn't "uniformly random" a contradiction of terms?

kadoban 2 years ago | |

No. Random just means you can't predict what value will be chosen, but doesn't tell you how likely different values are.

If I roll a 12-sided die, that's random. If I roll two 6-sided dice and add the result, that's also random, but it has a different distribution of values.

The two dice verison, there's one way to get a result of 2, but _several_ ways to get 7. You'll get 7 way more often than you'll get 2.

The one-die version each outcome is equally likely. You're exactly as likely to get 2 as you are to get 7 or any other value in the range of possibilities.

The one-die version is a uniform distribution. The two-dice version is not uniform.

esafak 2 years ago | |

Random is not precise; it does specify a distribution, though uniform is commonly assumed. A better term is "uniformly distributed".

nighthawk454 2 years ago |

See also this method based on optimizing nearest-neighbor distances after initializing with fibonacci spiral:

https://extremelearning.com.au/how-to-evenly-distribute-poin...

JadoJodo 2 years ago |

I'm definitely not at all qualified to talk about this, but... aren't "uniform" and "random" antonyms?