paint-brush
My Notes on MAE vs MSE Error Metrics 🚀by@sengul
44,794 reads
44,794 reads

My Notes on MAE vs MSE Error Metrics 🚀

by Sengul KaraderiliMarch 11th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

We will focus on MSE and MAE metrics, which are frequently used model evaluation metrics in regression models. MAE is the average distance between the real data and the predicted data, but fails to punish large errors in prediction. MSE measures the average squared difference between the estimated values and the actual value. L1 and L2 Regularization is a technique used to reduce the complexity of the model. It does this by penalizing the loss function by regularizing the function of the function.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - My Notes on MAE vs MSE Error Metrics 🚀
Sengul Karaderili HackerNoon profile picture

This post contains my notes on error metrics.

Contents:

  • Linear Regression Summary
  • MAE
  • MSE
  • Compare MAE vs MSE
  • Bonus: L1 and L2 Regularization
  • Experiment Lab
  • Bonus! If we want to compare MAE and RMSE
  • Sources


    We will focus on MSE and MAE metrics, which are frequently used model evaluation metrics in regression models.


    Linear Regression Summary

    In linear regression:

    y’ is the predicted label (a desired output) b is the bias (the y-intercept)w1 is the weight of featurex1 is a feature (a known input)


    Assumptions of Linear Regression 💫

    Normal Distribution

    1. Normal distribution of residuals

    Normality of residuals. The residuals should be normally distributed.

    2. Linearity of residuals

    The regression model is linear in parameters. The mean of residuals is zero. Independence of residuals

    There are basically 2 classes of dependencies

    Residuals correlate with another variable. Multicollinearity is a fancy way of saying that your independent variables are highly correlated with each other.

    Residuals correlate with other (close) residuals (autocorrelation). No autocorrelation of residuals. This is applicable especially for time series data. Autocorrelation is the correlation of a Time Series with lags of itself.

    3. Equal variance of residuals

    Homoscedasticity is present when the noise of your model can be described as random and the same throughout all independent variables. Again, the mean of residuals is zero.

    Mean Absolute Error (MAE)

    MAE is the average of all absolute errors. The absolute average distance between the real data and the predicted data, but fails to punish large errors in prediction.

    Steps of MAE:

  1. Find all of your absolute errors, xi – x.
  2. Add them all up.
  3. Divide by the number of errors. For example, if you had 10 measurements, divide by 10.
  4. Mean Square Error (MSE) 

    MSE, measures the average of the squares of the errors— that is, the average squared difference between the estimated values and the actual value.

    It is always non-negative, and values closer to zero are better.

    Steps of MSE:

    1. Calculate the residuals for every data point.
    2. Calculate the squared value of the resilduals.
    3. Calculate the average of residuals from step 2.
    4. Compare Them

      MAE:

      • The idea behind the absolute error is to avoid mutual cancellation of the positive and negative errors.
      • An absolute error has only non-negative values.
      • By the same token, avoiding the potential of mutual cancelations has its price – skewness (bias)cannot be determined.
      • Absolute error preserves the same units of measurement as the data under analysis and gives all individual errors the same weights (as compared to squared error).
      • This distance is easily interpretable and when aggregated over a dataset using arithmetic mean has a meaning of the average error.
      • The use of absolute value might present difficulties in the gradient calculation of model parameters. This distance is used in such popular metrics as MAE, MdAE, etc.

      MSE:

      • The squared error follows the same idea as the absolute error – avoid negative error values and mutual cancellation of errors.
      • Due to the square, large errors are emphasized and have a relatively greater effect on the value of the performance metric. At the same time, the effect of relatively small errors will be even smaller. Sometimes this property of the squared error is referred to as penalizing extreme errors or being susceptible to outliers. Based on the application, this property may be considered positive or negative. For example, emphasizing large errors may be a desirable discriminating measure in evaluating models.
      • In case of data outliers , MSE will become much larger compared to MAE. Avoiding the potential of mutual cancelations has its price – skewness (bias)cannot be determined (for MAE).
      • In MSE, error increases in a quadratic fashion while the error increases in a proportional fashion in MAE.
      • In MSE since the error being squared, any prediction error is being heavily penalized.

    Ref: https://arxiv.org/pdf/1809.03006.pdf