Niklas Buschmann

Common probability distributions

Table of contents

Probability of finding a sum of random variables

The probability density p(x) \textrm{p}(\overline{x}) of finding the average x=ixin \overline{x} = \frac{\sum_i x_i}{n} of n n random variables can be written using the Delta function as:

p(x)=δ(xixin)p1(x1)pn(xn)dx1dxn=δ(xixin)ipi(xi)dxi\textrm{p}(\overline{x}) = \int \delta\left(\overline{x}-\sum_i \frac{x_i}{n}\right)\textrm{p}_1(x_1) \dots \textrm{p}_n(x_n)\textrm{d}x_1 \dots \textrm{d}x_n = \int \delta\left(\overline{x}-\sum_i\frac{x_i}{n}\right)\prod_i \textrm{p}_i(x_i) \textrm{d}x_i

Calculating the Fourier transform p^(k) \hat{\textrm{p}}(k) yields:

p^(k)=eikxp(x)dx=eikixi/nipi(xi)dxi=ieikxi/npi(xi)dxi=ip^i(k/n)\hat{\textrm{p}}(k) = \int e^{-ik\overline{x}}\textrm{p}(x)\textrm{d}\overline{x} = \int e^{-ik \sum_i x_i/n}\prod_i \textrm{p}_i(x_i) \textrm{d}x_i = \prod_i \int e^{-ikx_i/n} \textrm{p}_i(x_i) \textrm{d}x_i = \prod_i \hat{\textrm{p}}_i(k/n)

The original p(x) \textrm{p}(\overline{x}) can now be recovered using the inverse Fourier transform:

p(x)=12πeikxp^(k)dk=12πeikxip^i(k/n)dk=12πeikx(pi(k/n))ndk\textrm{p}(\overline{x}) = \frac{1}{2\pi} \int e^{ik\overline{x}}\hat{\textrm{p}}(k)\textrm{d}k = \frac{1}{2\pi} \int e^{ik\overline{x}}\prod_i \hat{\textrm{p}}_i(k/n)\textrm{d}k = \frac{1}{2\pi} \int e^{ik\overline{x}} \left({\textrm{p}}_i(k/n)\right)^n \textrm{d}k

Where the last equality holds for identically distributed random variables.

Normal distribution

Taylor expanding any probability density p^i(k/n) \hat{\textrm{p}}_i(k/n) around k=0 k = 0 yields:

p^i(k/n)=mkmp^i(0)m!(kn)m=E[xim]m!(ikn)m=1+ikE[xi]n+i2k2E[xi2]2n2+\hat{\textrm{p}}_i(k/n) = \sum_m \frac{\partial_k^m \hat{\textrm{p}}_i(0)}{m!}\left(\frac{k}{n}\right)^m = \frac{\mathbb{E}[x_i^m]}{m!}\left(\frac{ik}{n}\right)^m = 1 + \frac{ik \mathbb{E}[x_i]}{n} + \frac{i^2k^2\mathbb{E}[x_i^2]}{2n^2} + \dots

Comparing up to first order in 1n \frac{1}{n} one gets:

(p^(k/n))n=(1+ikE[xi]n+i2k2E[xi2]2n2+)n=eikE[xi]ei2k2(E[xi2]E[xi]2)/2n+O(1n2)(\hat{\textrm{p}}(k/n))^n = \left(1 + \frac{ik \mathbb{E}[x_i]}{n} + \frac{i^2k^2\mathbb{E}[x_i^2]}{2n^2} + \dots \right)^n = e^{ik \mathbb{E}[x_i]}e^{i^2k^2(\mathbb{E}[x_i^2]-\mathbb{E}[x_i]^2)/2n} + O\left(\frac{1}{n^2}\right)

Writing μE[xi] \mu \equiv \mathbb{E}[x_i] and σ2E[xi2]E[xi]2 \sigma^2 \equiv \mathbb{E}[x_i^2]-\mathbb{E}[x_i]^2 , the original p(x) \textrm{p}(\overline{x}) can now be recovered using the inverse Fourier transform:

p(x)n12πeikxeikμei2k2σ2/2ndk=12πσ2/ne(xμ)22σ2/n\textrm{p}(\overline{x}) \overset{n\rightarrow\infty}{\rightarrow} \frac{1}{2\pi} \int e^{ik\overline{x}}e^{ik \mu}e^{i^2k^2\sigma^2/2n}\textrm{d}k = \frac{1}{\sqrt{2\pi\sigma^2/n}}e^{-\frac{(\overline{x}-\mu)^2}{2\sigma^2/n}}

The average of n n identically distributed random variables will for large n n become normally distributed with the same mean μ \mu and lower variance σ2/n \sigma^2/n compared to the original distribution.

χ \chi distribution

The probability of finding the root of the sum of squares of n n standard-normal distributed random variables is proportional to the surface area of a n n dimensional sphere:

p(χ)=δ(χixi2)iexi2/22πdxi=dAδ(χr)er2/22πnrn1dr=eχ2/2χn12n2Γ(n2)\textrm{p}(\chi) = \int\delta\left(\chi-\sqrt{\sum_ix_i^2}\right)\prod_i \frac{e^{-x_i^2/2}}{\sqrt{2\pi}}\textrm{d}x_i = \int\textrm{d}A\int\delta(\chi-r)\frac{e^{-r^2/2}}{\sqrt{2\pi}^n}r^{n-1}\textrm{d}r = \frac{e^{-\chi^2/2}\chi^{n-1}}{\sqrt{2^{n-2}}\Gamma\left(\frac{n}{2}\right)}

The surface area of the n-dimensional sphere can be calculated from:

1=iexi2/22πdxi=dAer2/2rn12πndr=dAettn212πndtΓ(n2)2πndA1=\prod_i \int\frac{e^{-x_i^2/2}}{\sqrt{2\pi}}\textrm{d}x_i = \int \textrm{d}A \int \frac{e^{-r^2/2}r^{n-1}}{\sqrt{2\pi}^n}\textrm{d}r = \int \textrm{d}A \int \frac{e^{-t}t^{\frac{n}{2}-1}}{2\sqrt{\pi^n}}\textrm{d}t \equiv \frac{\Gamma\left(\frac{n}{2}\right)}{2\sqrt{\pi^n}} \int \textrm{d}A

Where the gamma function is defined as Γ(x)0ettx1dt \Gamma(x) \equiv \int_0^\infty e^{-t}t^{x-1}\textrm{d}t .

χ2 \chi^2 distribution

A change of variables yields the distribution for the sum of squares, called the χ2 \chi^2 distribution:

p(χ2)=p(χ)dχdχ2=eχ2/2χn22nΓ(n2)\textrm{p}(\chi^2) = \textrm{p}(\chi)\frac{\textrm{d} \chi}{\textrm{d} \chi^2} = \frac{e^{-\chi^2/2}\chi^{n-2}}{\sqrt{2^n}\Gamma\left(\frac{n}{2}\right)}

Student-t distribution

When estimating the true mean μ \mu of a normally-distributed random variable x x by the sample mean xixin \overline{x} \equiv \frac{\sum_i x_i}{n} , the deviation xμ \overline{x}-\mu is again normally-distributed with a standard deviation of σn \frac{\sigma}{\sqrt{n}} , making δxμσ/n\delta \equiv \frac{\overline{x}-\mu}{\sigma/\sqrt{n}} standard-normal distributed. By Cochran’s theorem the ratio χ2(n1)s2σ2 \chi^2 \equiv \frac{(n-1)s^2}{\sigma^2} between sample variance s2i(xix)2n1 s^2 \equiv \frac{\sum_i (x_i-\overline{x})^2}{n-1} and true variance σ2 \sigma^2 follows a χ2 \chi^2 -distribution of degree n1 n-1 . The deviation txμs/n=δχ/n1 t \equiv \frac{\overline{x}-\mu}{s/\sqrt{n}} = \frac{\delta}{\chi/\sqrt{n-1}} will then depend only on the measurable quantities x \overline{x} and s2 s^2 , following a so-called Student-t distribution of degree n1 n-1 :

p(t)=δ(tδχ/n1)eδ2/22πp(δ)eχ2/2χn32n1Γ(n12)p(χ2)dδdχ2=e(tχ)2/2(n1)2πeχ2/2χn32n1Γ(n12)χn1dχ2=eχ2(t2/(n1)+1)/22nπ(n1)(χ2)n22Γ(n12)dχ2=euun212nπ(n1)(2t2/(n1)+1)n2Γ(n12)du=Γ(n2)π(n1)Γ(n12)(1+t2n1)n2\begin{aligned} p(t) &= \iint\delta\left(t-\frac{\delta}{\chi/\sqrt{n-1}}\right)\underbrace{\frac{e^{-\delta^2/2}}{\sqrt{2\pi}}}_{p(\delta)}\underbrace{\frac{e^{-\chi^2/2}\chi^{n-3}}{\sqrt{2^{n-1}}\Gamma\left(\frac{n-1}{2}\right)}}_{p(\chi^2)}\textrm{d}\delta\textrm{d}\chi^2 \\ &= \int\frac{e^{-(t\chi)^2/2(n-1)}}{\sqrt{2\pi}}\frac{e^{-\chi^2/2}\chi^{n-3}}{\sqrt{2^{n-1}}\Gamma\left(\frac{n-1}{2}\right)}\frac{\chi}{\sqrt{n-1}}\textrm{d}\chi^2 \\ &= \int\frac{e^{-\chi^2(t^2/(n-1)+1)/2}}{\sqrt{2^n\pi(n-1)}}\frac{(\chi^2)^{\frac{n-2}{2}}}{\Gamma\left(\frac{n-1}{2}\right)}\textrm{d}\chi^2 \\ &= \int\frac{e^{-u}u^{\frac{n}{2}-1}}{\sqrt{2^n\pi(n-1)}}\frac{\left(\frac{2}{t^2/(n-1)+1}\right)^{\frac{n}{2}}}{\Gamma\left(\frac{n-1}{2}\right)}\textrm{d}u \\ &= \frac{\Gamma\left(\frac{n}{2}\right)}{\sqrt{\pi (n-1)}\Gamma\left(\frac{n-1}{2}\right)}\left(1+\frac{t^2}{n-1}\right)^{-\frac{n}{2}} \end{aligned}

Multinomial coefficient

Suppose you wants to arrange ki k_i copys of i i different things into n=iki n = \sum_i k_i bins. How many distinct ways (nk1,,ki) \binom{n}{k_1,\dots,k_i} are there to do this?

Starting with the case that each thing is unique (all ki k_i = 1), then the first bin can be filled with one of n n things, the second with (n1) (n-1) remaining things, and so on, yielding n! n! distinct possibilities:

(nk1=1,,ki=1)=n(n1)(n2)()=n!\binom{n}{k_1=1,\dots,k_i=1} = n(n-1)(n-2)(\dots) = n!

Grouping together ki k_i of the previously unique things into a category ii reduces the amount by a factor of ki! k_i! since there are now ki k_i ways to select the first thing out of the category ii, and ki1 k_i - 1 ways to select the second, and so on, yielding an overall result of:

(nk1,,ki)=n!k1!,,ki!=n!iki!\binom{n}{k_1,\dots,k_i} = \frac{n!}{k_1!,\dots,k_i!}= \frac{n!}{\prod_i k_i!}

What if multiple things can go into one bin? This problem can be reduce to the previous problem by adding n1 n-1 separators to the number of bins and treating the separators as an additional category of thing that can be place in a single-occupied bin.

Multinomial distribution

Now what is the probability P(k1,,ki) \textrm{P}(k_1,\dots,k_i) when measuring n=iki n = \sum_i k_i outcomes that there will be ki k_i outcomes of each type i i ? With the probabilities pi p_i of measuring outcome i i in a single observation the total proability P P is then the number of possibilities realizing this outcome multiplied by the product all of the pi p_i :

P(k1,,ki)=(nk1,,ki)ipiki=n!ipikiki!\textrm{P}(k_1,\dots,k_i) = \binom{n}{k_1,\dots,k_i} \prod_i p_i^{k_i} = n!\prod_i \frac{p_i^{k_i}}{k_i!}

For two possible outcomes with k1+k2=n k_1 + k_2 = n and p1+p2=1 p_1 + p_2 = 1 one recovers the binomial distribution:

P(n,k)=n!k!(nk)!pk(1p)nk\textrm{P}(n, k) = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}

Poisson distribution

Now taking the limit n n \rightarrow \infty and pi0 p_i \rightarrow 0 while keeping the expectation values λinpi \lambda_i \equiv n \cdot p_i constant yields the Poisson distribution. With the requirement ipi=1 \sum_i p_i = 1 we set p01ipi=1iλin p_0 \equiv 1-\sum_i p_i = 1-\sum_i \frac{\lambda_i}{n} as the probability of nothing happening during one of the n n obervations and k0nikink k_0 \equiv n-\sum_i k_i \equiv n-k as the number of times this happens. The probability of observing each event ki {k_i} times is then given by:

P(k1,,ki)=n!p0k0k0!ipikiki!=n!(nk)!p0nk1nkiλikiki!=n!(nk)!1nk 1(1iλin)n exp(iλi)(1iλin 1)kiλikiki!niλikiki!eλi\begin{aligned} \textrm{P}(k_1,\dots,k_i) &= n!\frac{p_0^{k_0}}{k_0!}\prod_i \frac{p_i^{k_i}}{k_i!} \\ &= \frac{n!}{(n-k)!}p_0^{n-k}\frac{1}{n^k}\prod_i \frac{\lambda_i^{k_i}}{k_i!} \\ &= \underbrace{\frac{n!}{(n-k)!}\frac{1}{n^k}}_{\rightarrow\ 1}\underbrace{\left(1-\frac{\sum_i \lambda_i}{n}\right)^n}_{\rightarrow\ \exp\left(\sum_i \lambda_i\right)} \bigg({\underbrace{1-\frac{\sum_i \lambda_i}{n}}_{\rightarrow\ 1}} \bigg)^{-k}\prod_i \frac{\lambda_i^{k_i}}{k_i!} \\ &\overset{n\rightarrow\infty}{\rightarrow} \prod_i \frac{\lambda_i^{k_i}}{k_i!}e^{\lambda_i} \end{aligned}

Geometric distribution

The probability of having the first success in a binomial setup after exactly k k trials is simply given by:

P(k)=(1p)kp\textrm{P}(k) = (1-p)^k p

Exponential distribution

Writing x=kn x = \frac{k}{n} , p=λn p = \frac{\lambda}{n} , 1n=dx \frac{1}{n} = \textrm{d}x and taking the limit n n \rightarrow \infty yields the waiting time distribution in a Poisson process:

P(x)=(1λn)xnλnneλxλdx\textrm{P}(x) = \left(1-\frac{\lambda}{n}\right)^{xn} \frac{\lambda}{n} \overset{n\rightarrow\infty}{\rightarrow} e^{-\lambda x}\lambda\textrm{d}x

Erlang distribution

The cumulative waiting time for n n events is the sum of n n exponential distributions. Since the exponential distribution is a special case of a χ2 \chi^2 distribution with k=2 k=2 we can get the sum of n n exponential distributions simply as χ2 \chi^2 distribution with k=2n k=2n :

p(χ2=2λx,k=2n)=(2λx)n12nΓ(n)eλxdχ2dx=λnxn1(n1)!eλx\textrm{p}(\chi^2=2\lambda x, k=2n) = \frac{(2\lambda x)^{n-1}}{2^n\Gamma(n)}e^{-\lambda x}\frac{\textrm{d}\chi^2}{\textrm{d}x} = \frac{\lambda^n x^{n-1}}{(n-1)!}e^{-\lambda x}