Root Mean Squared Error (RMSE)

The Root Mean Squared Error (RMSE) is one of the most commonly used metrics for evaluating regression models. It is an extension of the Mean Squared Error (MSE), with one key difference: the RMSE takes the square root of the MSE. This seemingly small change in the formula has a significant impact on the interpretation of the error metric. Whereas MSE can be difficult to interpret directly due to its squared units, RMSE provides an error measure that is in the same units as the original data. This makes RMSE more intuitive and easier to understand.

Like MSE, RMSE is sensitive to large deviations from the true values. However, by taking the square root, RMSE provides a balance between penalizing large errors and maintaining interpretability in the same units as the target variable. For these reasons, RMSE is frequently used in various machine learning tasks, such as regression, time series forecasting, and any scenario where continuous predictions are being made.

Definition and Intuition

The Root Mean Squared Error (RMSE) is a metric used to measure the average magnitude of the errors in a set of predictions. It’s the square root of the Mean Squared Error (MSE), which is the average of the squared differences between predicted and actual values. By taking the square root of MSE, RMSE brings the error back to the original scale of the target variable, making it more interpretable.

RMSE quantifies how far off the predictions made by a model are from the actual values. However, unlike other metrics, RMSE places more emphasis on larger errors due to its squaring mechanism. This means that outliers or significant mistakes made by the model will contribute disproportionately to the RMSE, making it particularly useful when large errors are undesirable.

Model Performance Insights

Low RMSE

A low RMSE value indicates that the model is making predictions that are close to the actual values on average. The smaller the RMSE, the better the model is at predicting the target variable.

High RMSE

A high RMSE suggests that the model’s predictions are widely off from the actual values, with larger errors contributing more to the overall score. In this case, the model might be underperforming or might require further tuning.

Mathematical Formulation

The Root Mean Squared Error (RMSE) is a metric that measures the square root of the average of the squared differences between the predicted values and the actual values. It’s derived from the Mean Squared Error (MSE), but the square root is taken to bring the metric back to the same unit of measurement as the target variable, making it easier to interpret.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

$n$ is the number of data points (or observations).
$y_i$ is the actual value for the $i$-th observation.
$\hat{y}_i$ is the predicted value for the $i$-th observation.
$(y_i - \hat{y}_i)^2$ is the squared error between the predicted and actual value for the $i$-th observation.

Calculation Procedure

Calculate the error for each data point: For every observation, compute the squared difference between the actual value and the predicted value.
$$(y_i - \hat{y}_i)^2$$
Sum the squared errors: Once we have the squared errors for each observation, sum them up.
$$\sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Average the sum of squared errors: Divide the total sum of squared errors by the number of data points (\$n\$) to find the Mean Squared Error (MSE).
$$\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Take the square root of the MSE: Finally, take the square root of the MSE to compute the Root Mean Squared Error (RMSE).
$$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

Unlike Mean Absolute Error (MAE), which has the same units as the target variable, RMSE has the same units as the target variable but magnifies the impact of larger errors due to the squaring. This makes it particularly sensitive to outliers.

Example

Let's imagine we predict the prices of 5 houses, and the actual prices are:

House 1: 300,000
House 2: 350,000
House 3: 400,000
House 4: 450,000
House 5: 500,000

If our model predicts the following prices:

Predicted House 1: 310,000
Predicted House 2: 340,000
Predicted House 3: 380,000
Predicted House 4: 460,000
Predicted House 5: 495,000

We can calculate the squared error for each house by finding the squared difference between the estimated and actual prices:

House 1: $(310,000 - 300,000)^2 = 100,000,000$
House 2: $(340,000 - 350,000)^2 = 100,000,000$
House 3: $(380,000 - 400,000)^2 = 400,000,000$
House 4: $(460,000 - 450,000)^2 = 100,000,000$
House 5: $(495,000 - 500,000)^2 = 25,000,000$

Now, the Mean Squared Error (MSE) is the average of these squared errors:

$$\text{MSE} = \frac{100,000,000 + 100,000,000 + 400,000,000 + 100,000,000 + 25,000,000}{5} = \frac{725,000,000}{5} = 145,000,000$$

Finally, to calculate the Root Mean Squared Error (RMSE), we take the square root of the MSE:

$$\text{RMSE} = \sqrt{145,000,000} \approx 12,042.0$$

So, the RMSE in this case is approximately 12,042, meaning on average, the model’s predictions are off by around 12,042 in their predictions. RMSE gives more weight to large errors (due to squaring), which is why the difference between house 3 and the actual price contributed significantly to the final score.

Properties and Behavior

The Root Mean Squared Error (RMSE) is a widely used metric for evaluating the accuracy of regression models. Understanding its properties and behavior is key to interpreting its results and knowing when to use it effectively.

Typical Value Ranges

Good Performance (Low RMSE):

A low RMSE indicates that the model’s predictions are close to the actual values. RMSE is especially useful when the magnitude of errors is crucial, as it gives a higher penalty to larger errors, making it a valuable metric when you want to minimize large deviations. For example, if you are predicting house prices, an RMSE of 12,000 suggests that, on average, the model's predictions are off by about 12,000, which could be acceptable depending on the context of the problem.

Poor Performance (High RMSE):

A high RMSE suggests that the model is making large errors on average, meaning its predictions are far from the actual values. If the RMSE is large, such as 50,000 for house prices, it indicates that the model is struggling to make accurate predictions. In this case, you might need to adjust the model or reconsider the features used for prediction.

While RMSE is helpful for comparing model performance, its interpretation depends on the context. A lower RMSE is always preferable, but what constitutes a "good" RMSE depends on the scale of the target variable and the specific task at hand.

Assumptions and Limitations

Although RMSE is a popular and widely used metric for regression tasks, it does come with a few assumptions and limitations that must be considered when interpreting its results and choosing it as an evaluation metric for a specific task.

Error Distribution:

RMSE assumes that the errors (the differences between the actual and predicted values) are normally distributed. This assumption aligns with the underlying model of many regression techniques, such as linear regression, where residuals (the differences between actual and predicted values) are expected to follow a normal distribution. If this assumption does not hold, RMSE may not be the most reliable metric.

Constant Scale:

RMSE assumes that the scale of the dependent variable (target variable) is the same across all data points. In situations where data involves different units or requires normalization, RMSE can be misleading unless proper preprocessing is done.

Not Robust to Non-Normal Errors:

RMSE assumes that the errors follow a normal distribution, but in practice, errors may not always follow this assumption. If the errors are skewed or have a heavy tail (i.e., a significant number of extreme outliers), RMSE may not accurately reflect the model's performance. In such cases, it could be beneficial to use other metrics (like MAE or quantile-based loss functions) that are less sensitive to such deviations.

Sensitivity to Outliers:

As discussed earlier, one of the primary limitations of RMSE is its sensitivity to outliers. The squaring of errors means that RMSE disproportionately emphasizes larger errors, and a few extreme outliers can dominate the metric. If outliers are due to noise or errors in the data, RMSE might give a distorted view of model performance.

Code Example

fn calculate_rmse(actual: &[f64], predicted: &[f64]) -> f64 {
    // Ensure the actual and predicted arrays have the same length
    if actual.len() != predicted.len() {
        panic!("The actual and predicted arrays must have the same length");
    }

    // Calculate the sum of squared errors
    let sum_of_squared_errors: f64 = actual.iter()
        .zip(predicted.iter())
        .map(|(a, p)| (a - p).powi(2)) // Calculate squared error for each pair
        .sum();

    // Return the square root of the mean squared errors
    (sum_of_squared_errors / actual.len() as f64).sqrt()
}

fn main() {
    // Example data: Actual and predicted house prices
    let actual_values = vec![500000.0, 550000.0, 450000.0];
    let predicted_values = vec![510000.0, 540000.0, 1000000.0]; // Outlier in the last prediction

    // Calculate RMSE
    let rmse = calculate_rmse(&actual_values, &predicted_values);

    // Output the result
    println!("Root Mean Squared Error (RMSE): {:.2}", rmse);
}

Explanation

Function calculate_rmse:

It takes two slices: one for the actual values and one for the predicted values.
It first checks if the lengths of the two slices are the same. If not, it panics (throws an error).
It then calculates the sum of squared errors by iterating through both arrays and using .powi(2) to square the difference between each predicted and actual value.
Finally, the sum of squared errors is divided by the number of data points (n), and the square root is taken to obtain the RMSE.

Main function:

Defines a simple dataset of actual and predicted house prices, with an outlier in the predictions.
Calls the calculate_rmse function to compute the RMSE.
Prints the result, rounded to two decimal places.

Output

Root Mean Squared Error (RMSE): 318690.07

Alternative Metrics

While Root Mean Squared Error (RMSE) is a widely used metric, it's not the only one available for evaluating the performance of regression models.

Mean Squared Error

MSE is particularly useful when we want to penalize larger errors more heavily. Unlike MAE, MSE squares the error terms, meaning large deviations between predicted and actual values contribute disproportionately to the final score. This makes MSE sensitive to outliers, so it is often preferred in contexts where large errors are undesirable.

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

👉 A detailed explanation of MSE can be found in the section: MSE

Mean Absolute Error

MAE is often preferred over MSE in situations where you want to treat all errors equally, regardless of size. Unlike MSE, it is less sensitive to outliers and provides a more robust metric in the presence of extreme values.

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

👉 A detailed explanation of MAE can be found in the section: MAE

R-squared

R² measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is often used when we want to understand how well our model explains the variation in the data. An $R^2$ value close to 1 indicates a good fit, while a value closer to 0 indicates a poor fit. However, R² can be misleading in some cases, especially with non-linear data or models that overfit.

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

👉 A detailed explanation of R² can be found in the section: R²

Mean Absolute Percentage Error

Mean Absolute Percentage Error (MAPE) calculates the average of the absolute percentage differences between predicted and actual values. This metric is useful when we want to express errors as a percentage of the actual value, making it easy to understand and interpret. MAPE is ideal when comparing models predicting different quantities or when scale independence is required. However, MAPE has significant drawbacks. It becomes highly sensitive when actual values approach zero, potentially leading to very large errors or undefined results.

$$\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left|\frac{y_i - \hat{y}_i}{y_i}\right| \times 100$$

Huber Loss

Huber Loss is a combination of Mean Squared Error (MSE) and Mean Absolute Error (MAE), switching between them depending on the size of the error. For small errors, it behaves like MSE (squared loss), and for larger errors, it acts like MAE (linear loss), controlled by a threshold parameter δ. Huber Loss is effective when working with noisy datasets where we want to strike a balance between penalizing small errors and being less sensitive to extreme deviations. However, it requires tuning the δ parameter, and its computation can be more complex than traditional loss functions. Moreover, it can be harder to interpret compared to simpler metrics like MSE or MAE.

$$\text{Huber}(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{for } |y - \hat{y}| \leq \delta \\ \delta \cdot |y - \hat{y}| - \frac{1}{2} \delta^2 & \text{for } |y - \hat{y}| > \delta \end{cases}$$

Advantages and Disadvantages

RMSE is a powerful metric, particularly suited for tasks where large errors need to be penalized more heavily. It’s differentiable and interpretable in terms of the target variable's units. However, it can be sensitive to outliers, and its interpretation becomes challenging when comparing models across different datasets or scales. It's important to choose RMSE when we are comfortable with its focus on large errors and are working with a dataset where outliers and extreme deviations are either expected or acceptable.

✅ Advantages:

RMSE’s primary advantage lies in its ability to emphasize large errors due to the squaring of the residuals. In many real-world applications, larger errors are more detrimental, and RMSE helps prioritize their reduction.
RMSE is differentiable, which makes it suitable for optimization algorithms such as gradient descent. This is useful in machine learning algorithms (e.g., linear regression, neural networks) where continuous optimization of a loss function is required.
RMSE is sensitive to the scale of the data, which can be advantageous when working with variables that have a specific unit or magnitude. This feature allows RMSE to be interpreted directly in the context of the target variable’s scale, making it more meaningful in many cases.
Since RMSE is the square root of MSE, it is in the same units as the target variable, which makes it easier to interpret compared to metrics like MSE, which has squared units. For example, if predicting house prices in dollars, RMSE will also be in dollars, providing a direct sense of the model's prediction error.

❌ Disadvantages:

One of the biggest disadvantages of RMSE is its sensitivity to outliers. Due to the squaring of residuals, extreme values or outliers have a disproportionate effect on the RMSE value, potentially skewing the model’s evaluation. This can be a significant drawback when dealing with noisy datasets or rare but extreme values.
While RMSE provides an absolute measure of error, it can be difficult to interpret in relative terms, especially when comparing different models with varying scales or magnitudes. For example, an RMSE of 100 might be very large in one context (e.g., predicting household income) but small in another (e.g., predicting global GDP).
RMSE doesn’t offer robustness against non-normal distributions or heteroscedasticity (i.e., varying levels of variance across data). In these situations, the RMSE might not accurately represent the true performance of the model and could be misleading.
RMSE does not differentiate between underestimations and overestimations. It treats positive and negative errors the same way by squaring the residuals. In certain applications where the direction of the error is important (e.g., underestimating a demand forecast versus overestimating it), RMSE may not be the ideal metric.
Since RMSE is dependent on the scale of the target variable, comparing the performance of models trained on different datasets or with different units can be challenging unless you normalize or standardize the data.

Conclusion

The Root Mean Squared Error (RMSE) is a widely used metric in regression tasks due to its ability to highlight large errors and provide a clear understanding of a model’s predictive performance. By penalizing larger errors more heavily, RMSE can be especially useful when the cost of large discrepancies in predictions is high, such as in areas like finance, healthcare, and engineering.

RMSE is differentiable, making it an excellent choice for optimization in machine learning algorithms that rely on gradient-based methods. Its interpretation in the same units as the target variable also adds to its practical appeal, as it directly connects the error to the scale of the problem at hand. However, RMSE’s sensitivity to outliers means that it might not always be the best choice for datasets with extreme values or noise. In such cases, metrics like Mean Absolute Error (MAE) might be preferred.

While RMSE is straightforward in its application, it has some limitations. Its inability to account for the direction of errors and its scale dependency can be a drawback in some contexts. Additionally, its emphasis on large errors means that it might overstate the importance of outliers in situations where the data is highly variable or contains extreme values.

External resources:

Example code in Rust available on 👉 GitHub Repository

Feedback & Sharing

Give us your thoughts on this page, or share it with others who may find it useful.

Feedback

Found this helpful? Let me know what you think or suggest improvements 👉 Contact me.