Gaussian distributions form a monoid(izbicki.me) |
Gaussian distributions form a monoid(izbicki.me) |
Amazing!
Uh, might want to reconsider and check that.
Might you be able to clarify this sentence? I have zero idea what might be meant:
> this is a useful property
What is the antecedent of "this", that is, in this phrases, what does "this" refer to?
> gaussian fits
What is a gaussian fit? I have no idea. I'm comfortable with the Lindeberg-Feller version of the central limit theorem, the weak and strong laws of large numbers, martingale theory, the martingale proof of the strong law of large numbers, the Radon-Nikodym theorem, and the fact that sample mean and variance are sufficient statistics for the Gaussian distribution, but, still, I can't even guess what a gaussian fit is.
> parallelizing
I can guess that what is meant by "parallelizing" is the computer software approach of having one program try to get some work done faster by starts several threads or tasks in one or several processor cores, processors, or computers. Okay. But what is it about "gaussian fits" that might commonly call for "parallelizing"?
All of the useful statistics for a probability distribution can be gleaned from the probability-density function by taking the Fourier transform and then (when it's not too bad) the (complex) natural logarithm -- or, when it's easier, just `f(s) = log(E[exp(s X)])` (having `s X` rather than `i s X` up-top). These are called cumulant-generating functions or CGFs.
If you have a sum of two independent random variables, the CGF of the sum is the sum of their CGFs. This means that the Taylor polynomial of the CGFs have components which simply add together. Take derivatives of the CGF to get the "cumulants", which are pseudo-linear (linear in independent random variables, with some relationship `f(k X) = k^n f(X)`). So these cumulants become really convenient for characterizing the distribution as a whole: in fact if you have all of them you can recover the original distribution.
Now if you describe your distributions in terms of its cumulants, you now have a special monoid:
data Distr n = Distr [n]
instance (Num n) => Monoid (Distr n) where
mempty = Distr (repeat 0)
mappend = zipWith (+)
Gaussian distributions in particular have a form for their CGF which is preserved by this monoid operation, namely: if X ~ Gauss(m, s^2) then ln(E(exp(k X))) = m k + s^2 k^2
(Note that the leading term has to be 0, as if s = 0 we have ln(E(1)) = ln(1) = 0.) Another way to state this is "the Fourier transform of a Gaussian is another Gaussian."So that is super-simple, and you can already see the hints of the central limit theorem emerging: the mean of N identically-distributed variables X_i will have a CGF related to the coefficients c_i of the original distribution by:
ln(E(k * sum_i X_i / N)) = N sum_m c_m k^m / N^m
so the Taylor expansion gets attenuated by successive powers of N: you can approximate the CGF with truncation.So, the set of all distributions form a monoid under convolution if you allow the Dirac delta-function to act as `mempty` -- in a certain representation, this appears as the termwise-summing monoid. Gaussian distributions are a sub-monoid of this larger monoid, and they are not the only one: we could easily go out to 3, 4, or 5 cumulants before setting the rest to 0 and finding a similar sub-family.
I presume that the parallelisation point was with reference to the point made by the article, that the calculation of means and variances can be parallelised, so large datasets can be dealt with efficiently.
Is there something else you are missing?
I'd never do anything like that and would advise others not do also.
Why? Because it is not the least bit clear just what the heck you get.
Next, likely you should not fit at all. Instead, if want to use some distribution with parameters, e.g., Gaussian, uniform, exponential, then just estimate the parameters and not the distribution.
E.g., if you know that the data is independent, identically distributed Gaussian, then take the sample mean and sample variance and let those be the two parameters in the Gaussian distribution. In that case, will know that the expectation and variance of the distribution are the same as in your data, and that's good.
That sample mean and variance are sufficient statistics for the Gaussian is also a biggie.
And look into the situation for the rest of the exponential family.
See also the famous
Paul R. Halmos, "The Theory of Unbiased Estimation", 'Annals of Mathematical Statistics', Volume 17, Number 1, pages 34-43, 1946.
If want to find the variance of a large data set, then how much accuracy do you want? Generally, sample variance from a few hundred numbers will be okay, and then don't need to consider execution on parallel computer hardware.
R. Hamming once wrote, "The purpose of computing is insight, not numbers."
Along that line, finding sample mean and variance of a huge data set promises little or no more "insight" than just sample mean and variance of an appropriately large sample. Of course, we are assuming that the data is independent and identically distributed so that a good sample is easy to find.
Look at the "Techniques", the first three are :
"Parametric methods, by which the parameters of the distribution are calculated from the data series.[2] The parametric methods are: method of moments method of L-moments[3] Maximum likelihood method[4]"
which if you work them out you get exactly what you'd expect.