Niklas Buschmann

Common probability distributions

Table of contents

Probability of finding a sum of random variables

The probability density p(x) \textrm{p}(\overline{x}) of finding the average x=ixin \overline{x} = \frac{\sum_i x_i}{n} of n n random variables can be written using the Delta function as:

p(x)=δ(xixin)p1(x1)pn(xn)dx1dxn=δ(xixin)ipi(xi)dxi\textrm{p}(\overline{x}) = \int \delta\left(\overline{x}-\sum_i \frac{x_i}{n}\right)\textrm{p}_1(x_1) \dots \textrm{p}_n(x_n)\textrm{d}x_1 \dots \textrm{d}x_n = \int \delta\left(\overline{x}-\sum_i\frac{x_i}{n}\right)\prod_i \textrm{p}_i(x_i) \textrm{d}x_i

Calculating the Fourier transform p^(k) \hat{\textrm{p}}(k) yields:

p^(k)=eikxp(x)dx=eikixi/nipi(xi)dxi=ieikxi/npi(xi)dxi=ip^i(k/n)\hat{\textrm{p}}(k) = \int e^{-ik\overline{x}}\textrm{p}(x)\textrm{d}\overline{x} = \int e^{-ik \sum_i x_i/n}\prod_i \textrm{p}_i(x_i) \textrm{d}x_i = \prod_i \int e^{-ikx_i/n} \textrm{p}_i(x_i) \textrm{d}x_i = \prod_i \hat{\textrm{p}}_i(k/n)

The original p(x) \textrm{p}(\overline{x}) can now be recovered using the inverse Fourier transform:

p(x)=12πeikxp^(k)dk=12πeikxip^i(k/n)dk=12πeikx(pi(k/n))ndk\textrm{p}(\overline{x}) = \frac{1}{2\pi} \int e^{ik\overline{x}}\hat{\textrm{p}}(k)\textrm{d}k = \frac{1}{2\pi} \int e^{ik\overline{x}}\prod_i \hat{\textrm{p}}_i(k/n)\textrm{d}k = \frac{1}{2\pi} \int e^{ik\overline{x}} \left({\textrm{p}}_i(k/n)\right)^n \textrm{d}k

Where the last equality holds for identically distributed random variables.

Normal distribution

Taylor expanding any probability density p^i(k/n) \hat{\textrm{p}}_i(k/n) around k=0 k = 0 yields:

p^i(k/n)=mkmp^i(0)m!(kn)m=E[xim]m!(ikn)m=1+ikE[xi]n+i2k2E[xi2]2n2+\hat{\textrm{p}}_i(k/n) = \sum_m \frac{\partial_k^m \hat{\textrm{p}}_i(0)}{m!}\left(\frac{k}{n}\right)^m = \frac{\mathbb{E}[x_i^m]}{m!}\left(\frac{ik}{n}\right)^m = 1 + \frac{ik \mathbb{E}[x_i]}{n} + \frac{i^2k^2\mathbb{E}[x_i^2]}{2n^2} + \dots

Comparing up to first order in 1n \frac{1}{n} one gets:

(p^(k/n))n=(1+ikE[xi]n+i2k2E[xi2]2n2+)n=eikE[xi]ei2k2(E[xi2]E[xi]2)/2n+O(1n2)(\hat{\textrm{p}}(k/n))^n = \left(1 + \frac{ik \mathbb{E}[x_i]}{n} + \frac{i^2k^2\mathbb{E}[x_i^2]}{2n^2} + \dots \right)^n = e^{ik \mathbb{E}[x_i]}e^{i^2k^2(\mathbb{E}[x_i^2]-\mathbb{E}[x_i]^2)/2n} + O\left(\frac{1}{n^2}\right)

Writing μE[xi] \mu \equiv \mathbb{E}[x_i] and σ2E[xi2]E[xi]2 \sigma^2 \equiv \mathbb{E}[x_i^2]-\mathbb{E}[x_i]^2 , the original p(x) \textrm{p}(\overline{x}) can now be recovered using the inverse Fourier transform:

p(x)n12πeikxeikμei2k2σ2/2ndk=12πσ2/ne(xμ)22σ2/n\textrm{p}(\overline{x}) \overset{n\rightarrow\infty}{\rightarrow} \frac{1}{2\pi} \int e^{ik\overline{x}}e^{ik \mu}e^{i^2k^2\sigma^2/2n}\textrm{d}k = \frac{1}{\sqrt{2\pi\sigma^2/n}}e^{-\frac{(\overline{x}-\mu)^2}{2\sigma^2/n}}

The average of n n identically distributed random variables will for large n n become normally distributed with the same mean μ \mu and lower variance σ2/n \sigma^2/n compared to the original distribution.

χ \chi distribution

The probability of finding the root of the sum of squares of n n standard-normal distributed random variables is proportional to the surface area of a n n dimensional sphere:

p(χ)=δ(χixi2)iexi2/22πdxi=dAδ(χr)er2/22πnrn1dr=eχ2/2χn12n2Γ(n2)\textrm{p}(\chi) = \int\delta\left(\chi-\sqrt{\sum_ix_i^2}\right)\prod_i \frac{e^{-x_i^2/2}}{\sqrt{2\pi}}\textrm{d}x_i = \int\textrm{d}A\int\delta(\chi-r)\frac{e^{-r^2/2}}{\sqrt{2\pi}^n}r^{n-1}\textrm{d}r = \frac{e^{-\chi^2/2}\chi^{n-1}}{\sqrt{2^{n-2}}\Gamma\left(\frac{n}{2}\right)}

The surface area of the n-dimensional sphere can be calculated from:

1=iexi2/22πdxi=dAer2/2rn12πndr=dAettn212πndtΓ(n2)2πndA1=\int\prod_i \frac{e^{-x_i^2/2}}{\sqrt{2\pi}}\textrm{d}x_i = \int \textrm{d}A \int \frac{e^{-r^2/2}r^{n-1}}{\sqrt{2\pi}^n}\textrm{d}r = \int \textrm{d}A \int \frac{e^{-t}t^{\frac{n}{2}-1}}{2\sqrt{\pi^n}}\textrm{d}t \equiv \frac{\Gamma\left(\frac{n}{2}\right)}{2\sqrt{\pi^n}} \int \textrm{d}A

Where the gamma function is defined as Γ(x)=0ettx1dt \Gamma(x) = \int_0^\infty e^{-t}t^{x-1}\textrm{d}t .

χ2 \chi^2 distribution

A change of variables yields the distribution for the sum of squares:

p(χ2)=p(χ)dχdχ2=eχ2/2χn22nΓ(n2)\textrm{p}(\chi^2) = \textrm{p}(\chi)\frac{\textrm{d} \chi}{\textrm{d} \chi^2} = \frac{e^{-\chi^2/2}\chi^{n-2}}{\sqrt{2^n}\Gamma\left(\frac{n}{2}\right)}

Student t distribution

The estimated mean x=ixin \overline{x} = \frac{\sum_i x_i}{n} deviates from the true mean μ \mu with a standard deviation of σn \frac{\sigma}{\sqrt{n}} . Since σ \sigma ist most likely unknown one has to use the estimated variance sn \frac{s}{\sqrt{n}} instead. Since xμσ/n \frac{\overline{x}-\mu}{\sigma/\sqrt{n}} ist standardnormal distributed and s2σ2 \frac{s^2}{\sigma^2} is χ2 \chi^2 distributed xμs/n \frac{\overline{x}-\mu}{s/\sqrt{n}} is independent of σ \sigma and follows a Student t distribution:

p(t)=δ(txχ2/n)ex2/22πeχ2/2χn22nΓ(n2)dχ2dx=e(tχ)2/2n2πneχ2/2χn22nΓ(n2)χdχ2=eχ2(1+t2/n)/22n+1πn(χ2)n12Γ(n2)dχ2=euun12duπnΓ(n2)(1+t2n)n+12=Γ(n+12)πnΓ(n2)(1+t2n)n+12\begin{aligned} p(t) &= \int\delta\left(t-\frac{x}{\sqrt{\chi^2/n}}\right)\frac{e^{-x^2/2}}{\sqrt{2\pi}}\frac{e^{-\chi^2/2}\chi^{n-2}}{\sqrt{2^n}\Gamma\left(\frac{n}{2}\right)}\textrm{d}\chi^2\textrm{d}x \\ &= \int\frac{e^{-(t\chi)^2/2n}}{\sqrt{2\pi n}}\frac{e^{-\chi^2/2}\chi^{n-2}}{\sqrt{2^n}\Gamma\left(\frac{n}{2}\right)}\chi\textrm{d}\chi^2 \\ &= \int\frac{e^{-\chi^2(1+t^2/n)/2}}{\sqrt{2^{n+1}\pi n }}\frac{(\chi^2)^{\frac{n-1}{2}}}{\Gamma\left(\frac{n}{2}\right)}\textrm{d}\chi^2 \\ &= \frac{\int e^{-u}u^{\frac{n-1}{2}}\textrm{d}u }{\sqrt{\pi n }\Gamma\left(\frac{n}{2}\right)\left(1+\frac{t^2}{n}\right)^{\frac{n+1}{2}}} \\ &= \frac{\Gamma\left(\frac{n+1}{2}\right)}{\sqrt{\pi n }\Gamma\left(\frac{n}{2}\right)}\left(1+\frac{t^2}{n}\right)^{-\frac{n+1}{2}} \end{aligned}

Where uχ2(1+t2/n)2 u \equiv \frac{\chi^2(1+t^2/n)}{2} . The variance of xμs/n \frac{\overline{x}-\mu}{s/\sqrt{n}} is no longer 1 but nn2 \frac{n}{n-2} instead.

Multinomial coefficient

Suppose you wants to arrange ki k_i copys of i i different things into n=iki n = \sum_i k_i bins. How many distinct ways (nk1,,ki) \binom{n}{k_1,\dots,k_i} are there to do this?

Starting with the case that each thing is unique (all ki k_i = 1), then the first bin can be filled with one of n n things, the second with (n1) (n-1) remaining things, and so on, yielding n! n! distinct possibilities:

(nk1=1,,ki=1)=n(n1)(n2)()=n!\binom{n}{k_1=1,\dots,k_i=1} = n(n-1)(n-2)(\dots) = n!

Grouping together ki k_i of the previously unique things into a category ii reduces the amount by a factor of ki! k_i! since there are now ki k_i ways to select the first thing out of the category ii, and ki1 k_i - 1 ways to select the second, and so on, yielding an overall result of:

(nk1,,ki)=n!k1!,,ki!=n!iki!\binom{n}{k_1,\dots,k_i} = \frac{n!}{k_1!,\dots,k_i!}= \frac{n!}{\prod_i k_i!}

What if multiple things can go into one bin? This problem can be reduce to the previous problem by adding n1 n-1 separators to the number of bins and treating the separators as an additional category of thing that can be place in a single-occupied bin.

Multinomial distribution

Now what is the probability P(k1,,ki) \textrm{P}(k_1,\dots,k_i) when measuring n=iki n = \sum_i k_i outcomes that there will be ki k_i outcomes of each type i i ? With the probabilities pi p_i of measuring outcome i i in a single observation the total proability P P is then the number of possibilities realizing this outcome multiplied by the product all of the pi p_i :

P(k1,,ki)=(nk1,,ki)ipiki=n!ipikiki!\textrm{P}(k_1,\dots,k_i) = \binom{n}{k_1,\dots,k_i} \prod_i p_i^{k_i} = n!\prod_i \frac{p_i^{k_i}}{k_i!}

For two possible outcomes with k1+k2=n k_1 + k_2 = n and p1+p2=1 p_1 + p_2 = 1 one recovers the binomial distribution:

P(n,k)=n!k!(nk)!pk(1p)nk\textrm{P}(n, k) = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}

Poisson distribution

Now taking the limit n n \rightarrow \infty and pi0 p_i \rightarrow 0 while keeping the expectation values λinpi \lambda_i \equiv n \cdot p_i constant yields the Poisson distribution. With the requirement ipi=1 \sum_i p_i = 1 we set p01ipi=1iλin p_0 \equiv 1-\sum_i p_i = 1-\sum_i \frac{\lambda_i}{n} as the probability of nothing happening during one of the n n obervations and k0nikink k_0 \equiv n-\sum_i k_i \equiv n-k as the number of times this happens. The probability of observing each event ki {k_i} times is then given by:

P(k1,,ki)=n!p0k0k0!ipikiki!=n!(nk)!p0nk1nkiλikiki!=n!(nk)!1nk 1(1iλin)n exp(iλi)(1iλin 1)kiλikiki!niλikiki!eλi\begin{aligned} \textrm{P}(k_1,\dots,k_i) &= n!\frac{p_0^{k_0}}{k_0!}\prod_i \frac{p_i^{k_i}}{k_i!} \\ &= \frac{n!}{(n-k)!}p_0^{n-k}\frac{1}{n^k}\prod_i \frac{\lambda_i^{k_i}}{k_i!} \\ &= \underbrace{\frac{n!}{(n-k)!}\frac{1}{n^k}}_{\rightarrow\ 1}\underbrace{\left(1-\frac{\sum_i \lambda_i}{n}\right)^n}_{\rightarrow\ \exp\left(\sum_i \lambda_i\right)} \bigg({\underbrace{1-\frac{\sum_i \lambda_i}{n}}_{\rightarrow\ 1}} \bigg)^{-k}\prod_i \frac{\lambda_i^{k_i}}{k_i!} \\ &\overset{n\rightarrow\infty}{\rightarrow} \prod_i \frac{\lambda_i^{k_i}}{k_i!}e^{\lambda_i} \end{aligned}

Geometric distribution

The probability of having the first success in a binomial setup after exactly k k trials is simply given by:

P(k)=(1p)kp\textrm{P}(k) = (1-p)^k p

Exponential distribution

Writing x=kn x = \frac{k}{n} and p=λn p = \frac{\lambda}{n} and taking the limit n n \rightarrow \infty yields the waiting time distribution in a Poisson process:

p(x)=(1λn)xnλnneλxλdx\textrm{p}(x) = \left(1-\frac{\lambda}{n}\right)^{xn} \frac{\lambda}{n} \overset{n\rightarrow\infty}{\rightarrow} e^{-\lambda x}\lambda\textrm{d}x

Here the remaining factor 1n \frac{1}{n} is removed by the normalisation of the density to 1.

Erlang distribution

The cumulative waiting time for n n events is the sum of n n exponential distributions. Since the exponential distribution is a special case of a χ2 \chi^2 distribution with k=2 k=2 we can get the sum of n n exponential distributions simply as χ2 \chi^2 distribution with k=2n k=2n :

p(χ2=2λx,k=2n)=(2λx)n12nΓ(n)eλxdχ2dx=λnxn1(n1)!eλx\textrm{p}(\chi^2=2\lambda x, k=2n) = \frac{(2\lambda x)^{n-1}}{2^n\Gamma(n)}e^{-\lambda x}\frac{\textrm{d}\chi^2}{\textrm{d}x} = \frac{\lambda^n x^{n-1}}{(n-1)!}e^{-\lambda x}