Laplace Distribution
A sharp-peaked, fat-tailed alternative to the Normal Distribution.
The Laplace distribution is a continuous probability distribution that is notable for its sharper peak at the mean and its "fatter" tails compared to the Normal distribution. This means it assigns higher probability to values near the mean and also to extreme outlier events.
In finance and machine learning, this makes it a valuable tool. It can model financial returns that are more prone to extreme events than a normal model would suggest. It is also intrinsically linked to LASSO (L1) regularization, a popular technique in regression for feature selection, because its shape naturally encourages some parameters to go to zero.
Core Concepts
- (mu) is the location parameter, which is also the mean, median, and mode.
- is the scale parameter, which controls the spread or "width" of the distribution.
Expected Value (Mean)
Variance
Key Derivations
Deriving the Expected Value (Mean)
Step 1: Use Symmetry
The Laplace PDF is symmetric around . The function is an odd function with respect to . The integral of an odd function over a symmetric interval is zero.
Step 2: Solve for E[X]
Using the linearity of expectation, .
Therefore, , which leads to our result.
Deriving the Variance
We need to solve . Let and use the PDF for for simplicity.
Step 1: Set up the Integral for E[Y²]
Since the integrand is an even function, we can simplify this to:
Step 2: Apply Integration by Parts (First Pass)
Let and . Then and . Using :
The first term evaluates to 0. We are left with:
Step 3: Apply Integration by Parts (Second Pass)
We integrate . Let and . Then and .
The first term is 0. The second term is:
Step 4: Combine Results
Substituting the result from Step 3 back into the end of Step 2:
Since and , we have .
Applications
Machine Learning: LASSO Regression
In Bayesian statistics, using a Laplace distribution as the prior for regression coefficients is equivalent to performing LASSO (L1) regularization. The sharp peak at zero in the Laplace prior "encourages" coefficients of irrelevant features to be exactly zero, effectively performing feature selection and creating a simpler, more interpretable model.