Table of contents
Probability of finding a sum of random variables
The probability density p(x) of finding the average x=n∑ixi of n random variables can be written using the Delta function as:
p(x)=∫δ(x−i∑nxi)p1(x1)…pn(xn)dx1…dxn=∫δ(x−i∑nxi)i∏pi(xi)dxi
Calculating the Fourier transform p^(k) yields:
p^(k)=∫e−ikxp(x)dx=∫e−ik∑ixi/ni∏pi(xi)dxi=i∏∫e−ikxi/npi(xi)dxi=i∏p^i(k/n)
The original p(x) can now be recovered using the inverse Fourier transform:
p(x)=2π1∫eikxp^(k)dk=2π1∫eikxi∏p^i(k/n)dk=2π1∫eikx(pi(k/n))ndk
Where the last equality holds for identically distributed random variables.
Normal distribution
Taylor expanding any probability density p^i(k/n) around k=0 yields:
p^i(k/n)=m∑m!∂kmp^i(0)(nk)m=m!E[xim](nik)m=1+nikE[xi]+2n2i2k2E[xi2]+…
Comparing up to first order in n1 one gets:
(p^(k/n))n=(1+nikE[xi]+2n2i2k2E[xi2]+…)n=eikE[xi]ei2k2(E[xi2]−E[xi]2)/2n+O(n21)
Writing μ≡E[xi] and σ2≡E[xi2]−E[xi]2, the original p(x) can now be recovered using the inverse Fourier transform:
p(x)→n→∞2π1∫eikxeikμei2k2σ2/2ndk=2πσ2/n1e−2σ2/n(x−μ)2
The average of n identically distributed random variables will for large n become normally distributed with the same mean μ and lower variance σ2/n compared to the original distribution.
χ distribution
The probability of finding the root of the sum of squares of n standard-normal distributed random variables is proportional to the surface area of a n dimensional sphere:
p(χ)=∫δχ−i∑xi2i∏2πe−xi2/2dxi=∫dA∫δ(χ−r)2πne−r2/2rn−1dr=2n−2Γ(2n)e−χ2/2χn−1
The surface area of the n-dimensional sphere can be calculated from:
1=i∏∫2πe−xi2/2dxi=∫dA∫2πne−r2/2rn−1dr=∫dA∫2πne−tt2n−1dt≡2πnΓ(2n)∫dA
Where the gamma function is defined as Γ(x)≡∫0∞e−ttx−1dt.
χ2 distribution
A change of variables yields the distribution for the sum of squares, called the χ2 distribution:
p(χ2)=p(χ)dχ2dχ=2nΓ(2n)e−χ2/2χn−2
Student-t distribution
When estimating the true mean μ of a normally-distributed random variable x by the sample mean x≡n∑ixi, the deviation x−μ is again normally-distributed with a standard deviation of nσ, making δ≡σ/nx−μ standard-normal distributed. By Cochran’s theorem the ratio χ2≡σ2(n−1)s2 between sample variance s2≡n−1∑i(xi−x)2 and true variance σ2 follows a χ2-distribution of degree n−1. The deviation t≡s/nx−μ=χ/n−1δ will then depend only on the measurable quantities x and s2, following a so-called Student-t distribution of degree n−1:
p(t)=∬δ(t−χ/n−1δ)p(δ)2πe−δ2/2p(χ2)2n−1Γ(2n−1)e−χ2/2χn−3dδdχ2=∫2πe−(tχ)2/2(n−1)2n−1Γ(2n−1)e−χ2/2χn−3n−1χdχ2=∫2nπ(n−1)e−χ2(t2/(n−1)+1)/2Γ(2n−1)(χ2)2n−2dχ2=∫2nπ(n−1)e−uu2n−1Γ(2n−1)(t2/(n−1)+12)2ndu=π(n−1)Γ(2n−1)Γ(2n)(1+n−1t2)−2n
Multinomial coefficient
Suppose you wants to arrange ki copys of i different things into n=∑iki bins. How many distinct ways (k1,…,kin) are there to do this?
Starting with the case that each thing is unique (all ki = 1), then the first bin can be filled with one of n things, the second with (n−1) remaining things, and so on, yielding n! distinct possibilities:
(k1=1,…,ki=1n)=n(n−1)(n−2)(…)=n!
Grouping together ki of the previously unique things into a category i reduces the amount by a factor of ki! since there are now ki ways to select the first thing out of the category i, and ki−1 ways to select the second, and so on, yielding an overall result of:
(k1,…,kin)=k1!,…,ki!n!=∏iki!n!
What if multiple things can go into one bin? This problem can be reduce to the previous problem by adding n−1 separators to the number of bins and treating the separators as an additional category of thing that can be place in a single-occupied bin.
Multinomial distribution
Now what is the probability P(k1,…,ki) when measuring n=∑iki outcomes that there will be ki outcomes of each type i? With the probabilities pi of measuring outcome i in a single observation the total proability P is then the number of possibilities realizing this outcome multiplied by the product all of the pi:
P(k1,…,ki)=(k1,…,kin)i∏piki=n!i∏ki!piki
For two possible outcomes with k1+k2=n and p1+p2=1 one recovers the binomial distribution:
P(n,k)=k!(n−k)!n!pk(1−p)n−k
Poisson distribution
Now taking the limit n→∞ and pi→0 while keeping the expectation values λi≡n⋅pi constant yields the Poisson distribution. With the requirement ∑ipi=1 we set p0≡1−∑ipi=1−∑inλi as the probability of nothing happening during one of the n obervations and k0≡n−∑iki≡n−k as the number of times this happens. The probability of observing each event ki times is then given by:
P(k1,…,ki)=n!k0!p0k0i∏ki!piki=(n−k)!n!p0n−knk1i∏ki!λiki=→ 1(n−k)!n!nk1→ exp(∑iλi)(1−n∑iλi)n(→ 11−n∑iλi)−ki∏ki!λiki→n→∞i∏ki!λikieλi
Geometric distribution
The probability of having the first success in a binomial setup after exactly k trials is simply given by:
P(k)=(1−p)kp
Exponential distribution
Writing x=nk, p=nλ, n1=dx and taking the limit n→∞ yields the waiting time distribution in a Poisson process:
P(x)=(1−nλ)xnnλ→n→∞e−λxλdx
Erlang distribution
The cumulative waiting time for n events is the sum of n exponential distributions. Since the exponential distribution is a special case of a χ2 distribution with k=2 we can get the sum of n exponential distributions simply as χ2 distribution with k=2n:
p(χ2=2λx,k=2n)=2nΓ(n)(2λx)n−1e−λxdxdχ2=(n−1)!λnxn−1e−λx