Mean Absolute Error (MAE)

One of the most commonly used metrics for assessing model accuracy in regression problems is the Mean Absolute Error (MAE). MAE provides a simple and intuitive measure of how far off a model’s predictions are from the actual values.

This metric is frequently used in any problem where the objective is to predict continuous numerical outcomes. By calculating the absolute difference between predicted and actual values, MAE gives an immediate sense of the model's average error size, making it a go-to choice for tasks requiring straightforward, easily interpretable performance evaluations.

Definition and Intuition

The Mean Absolute Error (MAE) is a metric used to evaluate the performance of regression models. It measures the average magnitude of errors between predicted values and actual values, without considering whether the predictions are overestimations or underestimations. This is what gives it a straightforward, intuitive nature.

MAE simply calculates the average of the absolute differences between each predicted value and its corresponding true value. The "absolute" aspect means that all errors are treated as positive numbers, regardless of whether the prediction was above or below the actual value. This ensures that large positive and negative errors are treated equally in terms of their magnitude, making the metric robust to the direction of the errors.

In simple terms, if you think of a model trying to predict house prices, MAE will tell you, on average, how far off the model's predictions are from the actual prices. A lower MAE means the model is making more accurate predictions, whereas a higher MAE suggests that the model’s predictions are farther from the real values.

Model Performance Insights

Low MAE

A low MAE indicates that your model is generally making good predictions, with small errors across the board. It suggests that your model has a relatively small discrepancy between predicted and actual values.

High MAE

A high MAE suggests that your model is making larger errors on average. This could mean your model is struggling to fit the data properly, or there might be other factors influencing the model's accuracy (e.g., noise, outliers, etc.).

Core Concepts

Simplicity

One of the strengths of MAE is its simplicity. It is very easy to interpret, as it directly tells you the average size of errors in the same units as the predicted values.

Penalty for Errors

MAE penalizes all errors equally (i.e., it does not take into account whether the errors are positive or negative). This is particularly useful when you want to treat overestimations and underestimations with equal importance.

Interpretability

Since MAE is measured in the same units as the predicted variable, it’s easy to understand and apply in real-world contexts. For example, in a housing price prediction problem, if the MAE is $10,000, it means the model’s predictions, on average, are off by $10,000.

However, this simplicity can also be a limitation in certain scenarios, especially when the model needs to prioritize reducing the impact of large errors (which is where other metrics like Mean Squared Error (MSE) might come into play).

Mathematical Formulation

The Mathematical Formulation of Mean Absolute Error (MAE) is simple yet powerful. It’s computed by averaging the absolute differences between the predicted values and the actual values.

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

$n$ is the number of data points (or observations)
$y_i$ is the actual value for the $i$-th observation
$\hat{y}_i$ is the predicted value for the $i$-th observation
$|y_i - \hat{y}_i|$ is the absolute error between the predicted and actual value for the $i$-th observation

Calculation Procedure

Calculate the error for each data point: For every observation, compute the absolute difference between the actual value and the predicted value.
$$|y_i - \hat{y}_i|$$
Sum the absolute errors: Once we have the absolute errors for each observation, sum them up.
$$\sum_{i=1}^{n} |y_i - \hat{y}_i|$$
Average the sum: Finally, divide the total sum of the absolute errors by the number of data points ($n$) to find the Mean Absolute Error.
$$\frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

The MAE is expressed in the same units as the target variable. For example, if you're predicting prices in dollars, the MAE will also be in dollars. Unlike Mean Squared Error (MSE), which has squared units (e.g., square dollars), MAE keeps the error in the same scale, making it easier to interpret.

Example

Let's imagine that we predict the prices of 5 houses and the actual prices are:

House 1: 300,000
House 2: 350,000
House 3: 400,000
House 4: 450,000
House 5: 500,000

If our model predicts prices as follows:

Predicted House 1: 310,000
Predicted House 2: 340,000
Predicted House 3: 380,000
Predicted House 4: 460,000
Predicted House 5: 495,000

We can calculate the error for each house by finding the absolute difference between the estimated and actual prices:

House 1: $|310,000 - 300,000| = 10,000$
House 2: $|340,000 - 350,000| = 10,000$
House 3: $|380,000 - 400,000| = 20,000$
House 4: $|460,000 - 450,000| = 10,000$
House 5: $|495,000 - 500,000| = 5,000$

Now, the Mean Absolute Error (MAE) is the average of these errors:

$$\text{MAE} = \frac{10,000 + 10,000 + 20,000 + 10,000 + 5,000}{5} = \frac{55,000}{5} = 11,000$$

So, the MAE in this case is 11,000, meaning on average, your model is off by 11,000 in its predictions.

Properties and Behavior

The Mean Absolute Error (MAE) is a straightforward and intuitive metric, but understanding its properties and behavior is essential for interpreting its results and making informed decisions about model performance.

Typical Value Ranges

Good Performance (Low MAE):

A low MAE indicates that the model's predictions are close to the actual values. The lower the MAE, the better the model’s accuracy. For example, if we are predicting house prices, an MAE of 5,000 would suggest that, on average, the model's predictions are off by just 5,000, which might be acceptable in many cases.

Poor Performance (High MAE):

A high MAE means that the model is making larger errors on average. If the MAE is large (e.g., 50,000 in the house price example), it suggests that the model is struggling to predict accurately, and there may be significant issues with the data or the model itself.

While MAE gives you a general sense of error magnitude, its interpretation depends heavily on the context of the task. For some applications (e.g., predicting product prices), a small MAE might be critical, whereas in others (e.g., predicting weather temperatures), a slightly higher MAE might still be acceptable.

Sensitivity to Outliers or Noise

Outliers:

MAE is generally less sensitive to outliers than other metrics like Mean Squared Error (MSE). This is because MAE treats all errors equally, regardless of how large they are. In contrast, MSE penalizes large errors more heavily due to its squaring of differences. So, if there are a few extreme outliers in our data, MAE will not disproportionately inflate the error, which can make it more robust in some cases.

Differentiability and Role in Optimization

One limitation of MAE is that it is not differentiable at zero (i.e., when the predicted value exactly matches the actual value). This non-differentiability makes it challenging to directly use MAE as an objective function for gradient-based optimization methods, which are common in many machine learning algorithms.

However, despite this, many machine learning algorithms (such as those involving decision trees) can still effectively optimize for MAE without requiring the smooth gradient property. When used in gradient descent, alternative strategies like subdifferential methods might be applied to handle this issue.

Assumptions and Limitations

Assumption of Equal Weighting:

MAE assumes that all errors are equally important, regardless of their size. This assumption may not hold true in all cases. In some situations, large errors may be far more detrimental than small ones, which would make metrics like Mean Squared Error (MSE) a better choice because it penalizes large errors more heavily.

Lack of Sensitivity to Error Direction:

MAE treats positive and negative errors the same. In some situations, you may want to give more weight to either overestimations or underestimations, in which case metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE) could be more suitable.

Code Example

The example will use a simple dataset of predicted values and actual values, and we’ll implement the MAE calculation from scratch.

fn calculate_mae(actual: &[f64], predicted: &[f64]) -> f64 {
    // Ensure the actual and predicted arrays have the same length
    if actual.len() != predicted.len() {
        panic!("The actual and predicted arrays must have the same length");
    }

    // Calculate the sum of absolute errors
    let sum_of_errors: f64 = actual.iter()
        .zip(predicted.iter())
        .map(|(a, p)| (a - p).abs()) // Calculate absolute error for each pair
        .sum();

    // Return the mean of the absolute errors
    sum_of_errors / actual.len() as f64
}

fn main() {
    // Example data
    let actual_values = vec![300000.0, 350000.0, 400000.0, 450000.0, 500000.0];
    let predicted_values = vec![310000.0, 340000.0, 380000.0, 460000.0, 495000.0];

    // Calculate MAE
    let mae = calculate_mae(&actual_values, &predicted_values);

    // Output the result
    println!("Mean Absolute Error (MAE): {:.2}", mae);
}

Explanation

Function calculate_mae:

This function takes two slices &[f64] as inputs: one for the actual values and one for the predicted values.
It first checks if the lengths of the two slices are the same. If not, the program panics and gives an error message.
It then iterates through both slices, calculating the absolute difference between each actual value and its corresponding predicted value.
The results are summed up, and the sum is divided by the number of observations to get the mean absolute error.

Main Function:

We define example data for actual_values and predicted_values, both as Vec<f64>, which is a growable list in Rust.
We call the calculate_mae function with these values, and it returns the MAE, which is printed to the console.

Output

Mean Absolute Error (MAE): 11000.00

Alternative Metrics

While Mean Absolute Error (MAE) is an intuitive metric, there are several other metrics commonly used to evaluate the performance of regression models. Each of these metrics has its strengths and weaknesses, depending on the specific task at hand.

Mean Squared Error

MSE is particularly useful when we want to penalize larger errors more heavily. Unlike MAE, MSE squares the error terms, meaning large deviations between predicted and actual values contribute disproportionately to the final score. This makes MSE sensitive to outliers, so it is often preferred in contexts where large errors are undesirable.

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

👉 A detailed explanation of MSE can be found in the section: MSE

Root Mean Squared Error

RMSE is similar to MSE but returns the error in the same units as the original data, making it easier to interpret. Like MSE, it gives more weight to larger errors due to the squaring, and is commonly used when we need a metric that emphasizes large errors but also requires a result with the same scale as the target variable.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

👉 A detailed explanation of RMSE can be found in the section: RMSE

R-squared

R² (coefficient of determination) measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is often used when we want to understand how well our model explains the variation in the data. An $R^2$ value close to 1 indicates a good fit, while a value closer to 0 indicates a poor fit. However, R² can be misleading in some cases, especially with non-linear data or models that overfit.

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

👉 A detailed explanation of R-squared can be found in the section: R-squared

Advantages and Disadvantages

Like any metric, Mean Absolute Error (MAE) comes with its own set of strengths and weaknesses. It is important to understand these so that we can choose the right metric for your specific machine learning problem.

✅ Advantages:

MAE is very simple to compute and understand. It directly represents the average magnitude of the errors, making it intuitive and easy to explain to stakeholders, even those without a deep technical background.
Since the error is measured in the same units as the target variable, the interpretation is straightforward.
Unlike metrics like Mean Squared Error (MSE), MAE does not disproportionately penalize large errors because it does not square the error terms. This makes MAE more robust to outliers, which is useful when the data contains extreme values that should not dominate the metric.
MAE directly measures how far off your predictions are from the actual values, without adding any complexity. This makes it a useful metric for evaluating models where we care about the absolute magnitude of errors, not their direction or relationship to other metrics.

❌ Disadvantages:

One significant drawback of MAE is that it is non-differentiable at zero (i.e., when the predicted value exactly matches the actual value). This can make MAE challenging to optimize using gradient-based methods (like in deep learning or some regression models) without resorting to additional techniques or approximations.
Since MAE treats all errors equally, it does not penalize larger errors more heavily. This can be a disadvantage when large errors are particularly undesirable in a model's predictions. In situations where large errors are more costly, other metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE) might be preferred, as they give more weight to larger errors.
While MAE is a great measure of error magnitude, it does not provide insights into the overall fit of the model. Unlike R-squared (R²), which tells you how much of the variance in the data is explained by the model, MAE only tells you the average error, without any context about how well the model is capturing the underlying patterns.
MAE is sensitive to the scale of the data. If you are comparing the performance of two models on datasets with different scales, it can be difficult to make meaningful comparisons unless you normalize or standardize the data. For instance, predicting house prices in the millions will yield much higher MAE values than predicting the weight of a person in kilograms.

Conclusion

MAE remains a widely-used and practical metric in regression analysis due to its simplicity and interpretability. While it is not a perfect measure, it can be highly effective when used in the right contexts, especially when outliers need to be downplayed and the interpretation of error magnitude matters most.

MAE is an excellent choice when you need a clear and interpretable metric to understand how far off your predictions are from the actual values, and when you don't want to penalize large errors disproportionately. It works well in the presence of extreme values or outliers because it treats all errors equally, unlike MSE, which squares the errors. It treats errors equally, regardless of whether they are overestimates or underestimates, which is useful in cases where the direction of the error is not so important (e.g., in financial predictions).

External resources:

Example code in Rust available on 👉 GitHub Repository

Feedback & Sharing

Give us your thoughts on this page, or share it with others who may find it useful.

Feedback

Found this helpful? Let me know what you think or suggest improvements 👉 Contact me.