Table of contents
Probability of finding a sum of random variables
The probability density p(x) of finding the average x=n∑ixi of n random variables can be written using the Delta function as:
p(x)=∫δ(x−i∑nxi)p1(x1)…pn(xn)dx1…dxn=∫δ(x−i∑nxi)i∏pi(xi)dxi
Calculating the Fourier transform p^(k) yields:
p^(k)=∫e−ikxp(x)dx=∫e−ik∑ixi/ni∏pi(xi)dxi=i∏∫e−ikxi/npi(xi)dxi=i∏p^i(k/n)
The original p(x) can now be recovered using the inverse Fourier transform:
p(x)=2π1∫eikxp^(k)dk=2π1∫eikxi∏p^i(k/n)dk=2π1∫eikx(pi(k/n))ndk
Where the last equality holds for identically distributed random variables.
Normal distribution
Taylor expanding any probability density p^i(k/n) around k=0 yields:
p^i(k/n)=m∑m!∂kmp^i(0)(nk)m=m!E[xim](nik)m=1+nikE[xi]+2n2i2k2E[xi2]+…
Comparing up to first order in n1 one gets:
(p^(k/n))n=(1+nikE[xi]+2n2i2k2E[xi2]+…)n=eikE[xi]ei2k2(E[xi2]−E[xi]2)/2n+O(n21)
Writing μ≡E[xi] and σ2≡E[xi2]−E[xi]2, the original p(x) can now be recovered using the inverse Fourier transform:
p(x)→n→∞2π1∫eikxeikμei2k2σ2/2ndk=2πσ2/n1e−2σ2/n(x−μ)2
The average of n identically distributed random variables will for large n become normally distributed with the same mean μ and lower variance σ2/n compared to the original distribution.
χ distribution
The probability of finding the root of the sum of squares of n standard-normal distributed random variables is proportional to the surface area of a n dimensional sphere:
p(χ)=∫δχ−i∑xi2i∏2πe−xi2/2dxi=∫dA∫δ(χ−r)2πne−r2/2rn−1dr=2n−2Γ(2n)e−χ2/2χn−1
The surface area of the n-dimensional sphere can be calculated from:
1=∫i∏2πe−xi2/2dxi=∫dA∫2πne−r2/2rn−1dr=∫dA∫2πne−tt2n−1dt≡2πnΓ(2n)∫dA
Where the gamma function is defined as Γ(x)=∫0∞e−ttx−1dt.
χ2 distribution
A change of variables yields the distribution for the sum of squares:
p(χ2)=p(χ)dχ2dχ=2nΓ(2n)e−χ2/2χn−2
Student t distribution
The estimated mean x=n∑ixi deviates from the true mean μ with a standard deviation of nσ. Since σ ist most likely unknown one has to use the estimated variance ns instead. Since σ/nx−μ ist standardnormal distributed and σ2s2 is χ2 distributed s/nx−μ is independent of σ and follows a Student t distribution:
p(t)=∫δ(t−χ2/nx)2πe−x2/22nΓ(2n)e−χ2/2χn−2dχ2dx=∫2πne−(tχ)2/2n2nΓ(2n)e−χ2/2χn−2χdχ2=∫2n+1πne−χ2(1+t2/n)/2Γ(2n)(χ2)2n−1dχ2=πnΓ(2n)(1+nt2)2n+1∫e−uu2n−1du=πnΓ(2n)Γ(2n+1)(1+nt2)−2n+1
Where u≡2χ2(1+t2/n). The variance of s/nx−μ is no longer 1 but n−2n instead.
Multinomial coefficient
Suppose you wants to arrange ki copys of i different things into n=∑iki bins. How many distinct ways (k1,…,kin) are there to do this?
Starting with the case that each thing is unique (all ki = 1), then the first bin can be filled with one of n things, the second with (n−1) remaining things, and so on, yielding n! distinct possibilities:
(k1=1,…,ki=1n)=n(n−1)(n−2)(…)=n!
Grouping together ki of the previously unique things into a category i reduces the amount by a factor of ki! since there are now ki ways to select the first thing out of the category i, and ki−1 ways to select the second, and so on, yielding an overall result of:
(k1,…,kin)=k1!,…,ki!n!=∏iki!n!
What if multiple things can go into one bin? This problem can be reduce to the previous problem by adding n−1 separators to the number of bins and treating the separators as an additional category of thing that can be place in a single-occupied bin.
Multinomial distribution
Now what is the probability P(k1,…,ki) when measuring n=∑iki outcomes that there will be ki outcomes of each type i? With the probabilities pi of measuring outcome i in a single observation the total proability P is then the number of possibilities realizing this outcome multiplied by the product all of the pi:
P(k1,…,ki)=(k1,…,kin)i∏piki=n!i∏ki!piki
For two possible outcomes with k1+k2=n and p1+p2=1 one recovers the binomial distribution:
P(n,k)=k!(n−k)!n!pk(1−p)n−k
Poisson distribution
Now taking the limit n→∞ and pi→0 while keeping the expectation values λi≡n⋅pi constant yields the Poisson distribution. With the requirement ∑ipi=1 we set p0≡1−∑ipi=1−∑inλi as the probability of nothing happening during one of the n obervations and k0≡n−∑iki≡n−k as the number of times this happens. The probability of observing each event ki times is then given by:
P(k1,…,ki)=n!k0!p0k0i∏ki!piki=(n−k)!n!p0n−knk1i∏ki!λiki=→ 1(n−k)!n!nk1→ exp(∑iλi)(1−n∑iλi)n(→ 11−n∑iλi)−ki∏ki!λiki→n→∞i∏ki!λikieλi
Geometric distribution
The probability of having the first success in a binomial setup after exactly k trials is simply given by:
P(k)=(1−p)kp
Exponential distribution
Writing x=nk and p=nλ and taking the limit n→∞ yields the waiting time distribution in a Poisson process:
p(x)=(1−nλ)xnnλ→n→∞e−λxλdx
Here the remaining factor n1 is removed by the normalisation of the density to 1.
Erlang distribution
The cumulative waiting time for n events is the sum of n exponential distributions. Since the exponential distribution is a special case of a χ2 distribution with k=2 we can get the sum of n exponential distributions simply as χ2 distribution with k=2n:
p(χ2=2λx,k=2n)=2nΓ(n)(2λx)n−1e−λxdxdχ2=(n−1)!λnxn−1e−λx