Niklas Buschmann

Least squares and least absolute deviations

When doing a linear regression, the result can be computed by requiring the estimated parameters θj \theta_j to minimize the sum of squared differences between the measured values yi y_i and predicted values fθ(xi) f_{\theta}(x_i) :

S2ifθ(xi)yi2iΔi2S_2 \equiv \sum_i |f_{\theta}(x_i) - y_i|^2 \equiv \sum_i |\Delta_i|^2

But this requirement seems kind of arbitrary - why not minimize the sum of absolute differences instead?

S1ifθ(xi)yi1iΔi1S_1 \equiv \sum_i |f_{\theta}(x_i) - y_i|^1 \equiv \sum_i |\Delta_i|^1

Or one could even go a step lower and minimize the following sum:

S0ifθ(xi)yi0iΔi0S_0 \equiv \sum_i |f_{\theta}(x_i) - y_i|^0 \equiv \sum_i |\Delta_i|^0

For the sum Sm S_m to be minimal requires the derivatives to vanish: θiSn=0 \partial_{\theta_i} S_n = 0 . If the model function f f includes a constant α \alpha so that fθ(x)=gθ(x)+α f_{\theta}(x) = g_{\theta}(x) + \alpha , then αSn=f(x)Sn \frac{\partial}{\partial_{\alpha}} S_n = \frac{\partial}{\partial f(x)} S_n must be zero.

For the leat squares approach optimizing α \alpha ensures that the mean difference between predicted and measured values is zero since:

αS2=fiΔi2=iΔi=!0\frac{\partial}{\partial \alpha } S_2 = \frac{\partial}{\partial f} \sum_i |\Delta_i|^2 = \sum_i \Delta_i \overset{!}{=} 0

With the least absolute deviations approach instead, the optimization ensures that the median difference between predicted and measured values is zero:

αS1=fiΔi1=iΔiΔi=n(Δi>0)n(Δi<0)=!0\frac{\partial}{\partial \alpha } S_1 = \frac{\partial}{\partial f} \sum_i |\Delta_i|^1 = \sum_i \frac{\Delta_i}{|\Delta_i|} = n(\Delta_i > 0) - n(\Delta_i < 0) \overset{!}{=} 0

Optimizing S0 S_0 will ensure that the difference has a mode at zero:

αS0=fiΔi0=αn(Δi=0)=!0\frac{\partial}{\partial \alpha } S_0 = \frac{\partial}{\partial f} \sum_i \left| \Delta_i \right|^0 = - \partial_{\alpha}n\left(\Delta_i = 0\right)\overset{!}{=} 0

This explains why usually the least squares approach is chosen, because intuitively one would expect the model to be on average equal to the measurement. But when outliers are present, the least absolute deviations approach can be more useful, since optimizing the median is more robust to outliers.

Another comparison between the two approaches can be made using the maximum likelihood method, where least squares corresponds to normally distributed errors and least absolute deviations corresponds to double exponentially distributed errors.

See also Modes, Medians and Means: A Unifying Perspective as foundation of this post.