When doing any kind of linear regression, the result can always be computed by requiring the estimated parameters θj to minimize the sum of squared differences between the measured values yi and predicted values fθ(xi):
S1=i∑∣Δi∣2=i∑∣yi−fθ(xi)∣2
But this requirement always seemed kind of arbitrary to me - why not minimize the sum of absolute differences for example?
S2=i∑∣Δi∣=i∑∣yi−fθ(xi)∣
In both cases the fact that the sum is minimized implies that the derivative with respect to any parameter θj must vanish:
∂θj∂S1=i∑∂θj∂∣Δi∣2=−2i∑1Δi∂θj∂fθ(xi)=0∂θj∂S2=i∑∂θj∂∣Δi∣=−i∑∣Δi∣Δi∂θj∂fθ(xi)=0
Let us now assume that the model function f includes a constant c so that fθ(xi)=gθ(xi)+c. Since ∂c∂fθ=1 this implies that:
∂c∂S1=−2i∑Δi=0⇒i∑nΔi=0
We can see that - with least squares - the constant c ensures that the mean difference Δi between predicted and measured values is zero.
Comparing this with the least absolute deviations approach yields:
∂c∂S2=−i∑∣Δi∣Δi=0⇒n(Δi>0)=n(Δi<0)
Now - with least absolute deviations - the constant c ensures that the median difference between predicted and measured values is zero. Since ∣Δi∣Δi=±1, depending on whether the difference is positive or negative, the amount of times the difference is lower and greater than zero must be the same.
The same applies to all other parameters: least squares optimizes the mean difference and least absolute deviations optimizes the median difference - only that now the differences are weighted by a factor ∂θj∂fθ.
This explains why usually the least squares approach is chosen, because intuitively one would expect the model to be on average equal to the measurement. But when outliers are present, the least absolute deviations approach can be more useful, since optimizing the median is more robust to outliers.
Another comparison between the two approaches can be made using the maximum likelihood method, where least squares corresponds to normally distributed errors and least absolute deviations corresponds to double exponentially distributed errors.
See also Modes, Medians and Means: A Unifying Perspective as foundation of this post.