This post contains my notes on error metrics. Contents: Linear Regression Summary MAE MSE Compare MAE vs MSE Bonus: L1 and L2 Regularization Experiment Lab Bonus! If we want to compare MAE and RMSE Sources We will focus on MSE and MAE metrics, which are frequently used model evaluation metrics in regression models. Linear Regression Summary In linear regression: y’ is the predicted label (a desired output) b is the bias (the y-intercept)w1 is the weight of featurex1 is a feature (a known input) Assumptions of Linear Regression 💫 Normal Distribution 1. Normal distribution of residuals Normality of residuals. The residuals should be . normally distributed 2. Linearity of residuals The regression model is linear in parameters. The mean of residuals is zero. Independence of residuals There are basically 2 classes of dependencies Residuals correlate with another variable. Multicollinearity is a fancy way of saying that your independent variables are highly correlated with each other. Residuals correlate with other (close) residuals (autocorrelation). No autocorrelation of residuals. This is applicable especially for time series data. Autocorrelation is the correlation of a Time Series with lags of itself. 3. Equal variance of residuals Homoscedasticity is present when the noise of your model can be described as random and the same throughout all independent variables. Again, the mean of residuals is zero. Mean Absolute Error (MAE) MAE is the average of all absolute errors. The absolute average distance between the real data and the predicted data, but fails to punish large errors in prediction. Steps of MAE: Find all of your absolute errors, xi – x. Add them all up. Divide by the number of errors. For example, if you had 10 measurements, divide by 10. Mean Square Error (MSE) MSE, measures the average of the squares of the errors— that is, the average squared difference between the estimated values and the actual value. It is always non-negative, and values closer to zero are better. Steps of MSE: Calculate the residuals for every data point. Calculate the squared value of the resilduals. Calculate the average of residuals from step 2. Compare Them MAE: The idea behind the absolute error is to avoid mutual cancellation of the positive and negative errors. An absolute error has only non-negative values. By the same token, avoiding the potential of mutual cancelations has its price – skewness (bias)cannot be determined. Absolute error preserves the same units of measurement as the data under analysis and gives all individual errors the same weights (as compared to squared error). This distance is easily interpretable and when aggregated over a dataset using arithmetic mean has a meaning of the average error. The use of absolute value might present difficulties in the gradient calculation of model parameters. This distance is used in such popular metrics as MAE, MdAE, etc. MSE: The squared error follows the same idea as the absolute error – avoid negative error values and mutual cancellation of errors. Due to the square, large errors are emphasized and have a relatively greater effect on the value of the performance metric. At the same time, the effect of relatively small errors will be even smaller. Sometimes this property of the squared error is referred to as penalizing extreme errors or being susceptible to outliers. Based on the application, this property may be considered positive or negative. For example, emphasizing large errors may be a desirable discriminating measure in evaluating models. In case of data , MSE will become much larger compared to MAE. Avoiding the potential of mutual cancelations has its price – (bias)cannot be determined (for MAE). outliers skewness In MSE, error increases in a quadratic fashion while the error increases in a proportional fashion in MAE. In MSE since the error being squared, any prediction error is being heavily penalized. Ref: https://arxiv.org/pdf/1809.03006.pdf # Code Comparison # true: Array of true target variable # pred: Array of predictions def calculateMAE ( true, pred ): return np. sum (np. abs (true - pred)) def calculateMSE ( true, pred ): return np. sum ((true - pred)** 2 ) MAE and MSE with Different Models We can look at these examples to compare models. However, it may not make sense to compare metrics in different models at this time: https://dergipark.org.tr/tr/download/article-file/764199 https://www.kaggle.com/faressayah/linear-regression-house-price-prediction Bonus: L1 and L2 Regularization : is a technique used to reduce the complexity of the model. It does this by penalizing the loss function. Regularization L1 or Manhattan Norm: A type of regularization that penalizes weights in proportion to the . In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. L1 loss is less sensitive to outliers than L2 loss. sum of the absolute values of the weights L2 or Euclidian Norm: A type of regularization that penalizes weights in proportion to the . L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. L2 regularization always improves generalization in linear models. sum of the squares of the weights L2 and L1 penalize weights differently: L2 penalizes weight2.L1 penalizes |weight|. Experiment Lab ⚗️🧪🌡📊📉📈🔍 Let's see how metrics work on outlier and non-outlier data. # Import part. import numpy as np import pandas as pd from sklearn.metrics import mean_absolute_error from sklearn.metrics import mean_squared_error import statsmodels.api as sm I will generate synthetic data in a certain range with numpy. I produced pred data without running a model, because I don't want to focus on building models. I want to look at how the metrics change in different data. # First actual values. actual = np.random.randint(low= 50 , high= 101 , size=( 50 )) # Seconda my random pred data. pred = np.random.randint(low= 50 , high= 101 , size=( 50 )) print ( "Actual data (Random):" , actual) print ( "Pred data (Random):" , pred) Out[]: Actual data (Random): [ 53 95 63 78 88 59 96 86 52 71 78 89 77 60 97 79 71 87 55 92 69 76 80 66 80 88 89 68 69 98 100 57 83 72 82 72 52 78 94 76 69 59 73 70 99 97 100 63 73 94 ] Pred data (Random): [ 66 69 65 75 99 100 88 92 83 77 80 58 85 91 78 80 63 100 55 84 64 85 67 87 79 83 59 81 76 85 96 86 87 99 91 84 81 50 96 98 76 99 55 63 67 74 51 100 55 75 ] plt.plot(actual, m*actual + b) import matplotlib.pyplot as plt # create scatter plot plt.plot(actual, pred, 'o' ) # m = slope, b=intercept m, b = np.polyfit(actual, pred, 1 ) Out[]: mse = mean_squared_error(actual, pred) mae = mean_absolute_error(actual, pred) print ( "MAE without outliers:" , mae) print ( "MSE without outliers:" , mse) Out[]: MAE without outliers: 16.02 MSE without outliers: 408.1 I added some outliers to pred values. pred[[ 4 , 8 , 15 , 45 ]] = pred[[ 4 , 8 , 15 , 45 ]] + 50 plt.plot(actual, m*actual + b) # create scatter plot plt.plot(actual, pred, 'o' ) # m = slope, b=intercept m, b = np.polyfit(actual, pred, 1 ) Out[]: mse = mean_squared_error(actual, pred) mae = mean_absolute_error(actual, pred) print ( "MAE with outliers:" , mae) print ( "MSE with outliers:" , mse) Out[]: MAE with outliers: 19.1 MSE with outliers: 648.1 The observed increase in MAE was about 3. However, the RMSE increased by about 200. This means that if we have outlier data, MSE will produce a more sensitive result. MAE did not take into account outlier values. Bonus! If we want to compare MAE and RMSE It represents the sample standard deviation of the differences between predicted values and observed values (called residuals). RMSE: Case 1: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,10] MAE for case 1 = 2.0, RMSE for case 1 = 2.0 Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12] MAE for case 2 = 2.5, RMSE for case 2 = 2.65 From the above example, we can see that RMSE penalizes the last value prediction more heavily than MAE. This article was previously published on . medium Contact me if you have questions or feedback: sengulkaraderili@gmail.com 👩💻 Sources https://towardsdatascience.com/when-your-regression-models-errors-contain-two-peaks-13d835686ca https://www.dataquest.io/blog/understanding-regression-error-metrics/ statisticshowto: https://www.statisticshowto.com/absolute-error/ statisticshowto: https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-squared-error/ MLCC: https://developers.google.com/machine-learning/crash-course/descending-into-ml/linear-regression https://www.youtube.com/watch?v=KzHJXdFJSIQ Choosing the right metric: https://medium.com/usf-msds/choosing-the-right-metric-for-machine-learning-models-part-1-a99d7d7414e4 https://arxiv.org/pdf/1809.03006.pdf Stack exchange: https://stats.stackexchange.com/questions/48267/mean-absolute-error-or-root-mean-squared-error Makine Öğrenmesi Algoritmaları ile Hava Kirliliği Tahmini Üzerine Karşılaştırmalı Bir Değerlendirme: https://dergipark.org.tr/tr/download/article-file/764199 Multiple LR: https://www.reneshbedre.com/blog/multiple-linear-regression.html MLCC: https://developers.google.com/machine-learning/glossary#L2_regularization http://r-statistics.co/Assumptions-of-Linear-Regression.html https://www.hackdeploy.com/assumptions-of-linear-regression-with-python/