Menu Home About Support Contact

R-Squared (coefficient of determination)

R-squared

R-squared (often written as \(R^2\)), also known as the coefficient of determination, is one of the most commonly used metrics in regression analysis. It measures how well the independent variables (predictors) in a regression model explain the variance of the dependent variable (target). In simple terms, \(R^2\) tells us what proportion of the variance in the dependent variable can be explained by the model.

The \(R^2\) value ranges from 0 to 1, with 0 meaning that the model does not explain any of the variance and 1 meaning that the model explains all the variance in the target variable. However, it’s important to note that \(R^2\) can also be negative in certain cases (for example, when the model fits worse than a simple mean model), though this is rare in practice.

Definition and Intuition

R-squared (\(R^2\)) is a statistical metric that quantifies how well a regression model explains the variability of the dependent variable. Specifically, it measures the proportion of the total variation in the target variable that is accounted for by the model’s predictions. In essence, it answers the question: "How much better is this model at predicting the outcome than just using the mean of the observed values?"

The value of \(R^2\) lies between 0 and 1 for most well-behaved models:

  • \(R^2=1\) means the model perfectly explains all the variability of the target variable.
  • \(R^2=0\) means the model does no better than simply predicting the mean of the target variable every time.
  • \(R^2 < 0\) (possible in models without an intercept or in poor fits) means the model is worse than predicting the mean.

Unlike metrics like MAE or RMSE that express error in the same units as the target variable, \(R^2\) is unitless. It provides a normalized indication of fit, which can be helpful for comparing models trained on different datasets or tasks, as long as context is taken into account.

Model Performance Insights

High \(R^2\) (closer to 1)

Indicates that the model captures a large portion of the variance in the target variable. This typically suggests a good fit, although it does not guarantee that the model is correct or generalizes well. High \(R^2\) is desirable when the goal is explanatory power or when variance reduction is important.

Low \(R^2\) (closer to 0)

Suggests that the model fails to explain much of the variability in the target data. This may point to model underfitting, poor feature selection, or non-linear relationships not captured by the current model.

Negative \(R^2\)

Means the model performs worse than a simple average-based prediction. This is often a red flag that the model is fundamentally mis-specified or inappropriate for the data.

Core Concepts

Explained vs. Total Variance

\(R^2\) compares the explained variance (how much of the variation the model captures) with the total variance (how much variation exists in the data overall). The better the model captures the data’s structure, the closer the explained variance gets to the total variance, resulting in a higher \(R^2\).

Relative Measure

Unlike absolute error metrics, \(R^2\) doesn’t tell you how far off the predictions are, but how relatively effective the model is compared to a baseline (typically, predicting the mean).

Not Always a Sign of Quality

A high \(R^2\) doesn’t always mean a model is good. It might be overfitting (fitting noise in the data), especially if the model includes too many variables. Conversely, a low \(R^2\) might still be acceptable in fields where data is inherently noisy or difficult to model (e.g., social sciences).

Overall, \(R^2\) is best used as one part of a broader model evaluation strategy. It provides insight into variance explanation, but not a complete picture of predictive performance or generalization ability.

Mathematical Formulation

The R-squared (\(R^2\)) metric is based on comparing two sources of variability: the variance explained by the model, and the total variance present in the target variable.

Mathematically, it is defined as:

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$
  • \(n\) is the number of observations.
  • \(y_i\) is the actual value of the \(i\)-th observation.
  • \(\hat{y}_i\) is the predicted value for the \(i\)-th observation.
  • \(\bar{y}\) is the mean of all actual target values.
  • \(\sum (y_i - \hat{y}_i)^2\) is the residual sum of squares (RSS) — the unexplained variance.
  • \(\sum (y_i - \bar{y})^2\) is the total sum of squares (TSS) — the total variance in the data.

The ratio \(\frac{\text{RSS}}{\text{TSS}}\) measures the proportion of unexplained variance. Subtracting this from 1 gives the proportion of the variance that the model does explain.

Calculation Procedure

  1. Compute the mean of the actual values:
    $$\bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i$$
  2. Calculate the residual sum of squares (RSS):
    $$\text{RSS} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
  3. Calculate the total sum of squares (TSS):
    $$\text{TSS} = \sum_{i=1}^{n} (y_i - \bar{y})^2$$
  4. Compute \(R^2\):
    $$R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}$$
If RSS is equal to TSS, then \(R^2 = 0\), meaning the model explains none of the variance. If RSS is zero (perfect prediction), then \(R^2 = 1\).

Example

Let’s use a small dataset of actual and predicted values for a regression problem.

Actual values:

\(y = [3, 5, 7, 9, 11]\)

Predicted values:

\(\hat{y} = [2.8, 4.9, 7.1, 9.3, 11.2]\)

Step 1: Mean of actual values

$$\bar{y} = \frac{3 + 5 + 7 + 9 + 11}{5} = \frac{35}{5} = 7$$

Step 2: Compute RSS (Residual Sum of Squares)

$$\begin{align*} (3 - 2.8)^2 &= 0.04 \\ (5 - 4.9)^2 &= 0.01 \\ (7 - 7.1)^2 &= 0.01 \\ (9 - 9.3)^2 &= 0.09 \\ (11 - 11.2)^2 &= 0.04 \\ \text{RSS} &= 0.04 + 0.01 + 0.01 + 0.09 + 0.04 = 0.19 \end{align*}$$

Step 3: Compute TSS (Total Sum of Squares)

$$\begin{align*} (3 - 7)^2 &= 16 \\ (5 - 7)^2 &= 4 \\ (7 - 7)^2 &= 0 \\ (9 - 7)^2 &= 4 \\ (11 - 7)^2 &= 16 \\ \text{TSS} &= 16 + 4 + 0 + 4 + 16 = 40 \end{align*}$$

Step 4: Compute \(R^2\)

$$R^2 = 1 - \frac{0.19}{40} = 1 - 0.00475 = 0.99525$$

Interpretation:

An \(R^2\) of approximately 0.995 indicates that 99.5% of the variance in the actual values is explained by the model — a near-perfect fit.

Properties and Behavior

Understanding how the R-squared metric behaves in different situations is critical to interpreting it correctly. While it is often used as a default indicator of model quality, it comes with specific strengths and limitations depending on the data and modeling context.

Typical Value Ranges

\(R^2 = 1\) (Perfect fit):

All the model’s predictions fall exactly on the actual data points. The model explains 100% of the variance in the target variable.

\(R^2 = 0\) (Baseline performance):

The model explains none of the variance and is equivalent to simply predicting the mean of the target values.

\(R^2 < 0\) (Worse than baseline): The model’s predictions are worse than predicting the mean. This often occurs in models that are poorly fit or lack an intercept.

Generally, a higher \(R^2\) is preferred, but a high value does not guarantee a better model—especially if overfitting is involved.

Sensitivity to Outliers or Noise

R-squared is sensitive to extreme values and outliers. Since the metric is based on squared differences (like MSE), any large error will disproportionately reduce the \(R^2\) value.

  • With noise: \(R^2\) can drop significantly, especially in small datasets.
  • With outliers: A single outlier can distort both the model fit and the total variance, leading to misleading \(R^2\) values.
In practice, R-squared can look deceptively low in datasets where variance is inherently high, or deceptively high in cases of overfitting.

Differentiability and Role in Optimization

R-squared is not typically used directly as a loss function during model training. It is non-differentiable with respect to model parameters and it depends on the mean of the target variable and on the variance structure of the dataset, making it more of a descriptive statistic than a training objective.

Instead, differentiable metrics like MSE (Mean Squared Error) or MAE (Mean Absolute Error) are preferred for optimization purposes. These metrics can be minimized during training, and \(R^2\) is then computed afterward as a diagnostic.

Assumptions and Limitations

While R-squared is widely used, it relies on several implicit assumptions:

  • Linearity: \(R^2\) assumes that the relationship between the predictors and the response variable is linear. Nonlinear models may yield misleading \(R^2\) values.
  • Same scale: It is dataset-dependent and can only be compared meaningfully across models trained on the same dataset or target variable.
  • No insight into bias or error: R-squared does not provide us with any information about the actual size of errors, bias, or prediction accuracy in units.
  • No generalization guarantee: A high \(R^2\) on the training data may indicate overfitting and does not guarantee good performance on unseen data.
R-squared is excellent for explaining variance but poor for measuring prediction error. It works best as a complementary metric rather than the sole criterion for model evaluation.

Code Example

The simple implementation computes how well a model's predicted values explain the variance in the actual values.

fn calculate_r_squared(actual: &[f64], predicted: &[f64]) -> f64 {
    if actual.len() != predicted.len() {
        panic!("Length of actual and predicted values must be the same.");
    }

    let n = actual.len() as f64;

    // Compute the mean of actual values
    let mean_actual: f64 = actual.iter().sum::<f64>() / n;

    // Calculate Total Sum of Squares (TSS)
    let tss: f64 = actual.iter()
        .map(|y| (y - mean_actual).powi(2))
        .sum();

    // Calculate Residual Sum of Squares (RSS)
    let rss: f64 = actual.iter()
        .zip(predicted.iter())
        .map(|(y, y_hat)| (y - y_hat).powi(2))
        .sum();

    // Compute R-squared
    1.0 - (rss / tss)
}

fn main() {
    // Example data
    let actual_values = vec![3.0, 5.0, 7.0, 9.0, 11.0];
    let predicted_values = vec![2.8, 4.9, 7.1, 9.3, 11.2];

    // Calculate R²
    let r_squared = calculate_r_squared(&actual_values, &predicted_values);

    // Output the result
    println!("R-squared: {:.5}", r_squared);
}

Explanation

  • Input validation: The function checks that the actual and predicted slices have the same length to prevent mismatch errors.
  • Mean calculation: Calculates the mean of the actual values (\(\bar{y}\)).
  • TSS: Computes the total variance of the actual values relative to the mean.
  • RSS: Computes the squared errors between predicted and actual values.
  • R² computation: Applies the standard formula \(R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}\).

Output

R-squared: 0.99525

This means that approximately 99.5% of the variance in the actual values is explained by the model's predictions — a very good fit.

Alternative Metrics

Although R-squared (\(R^2\)) is useful for measuring the degree of variance explained by a model, it has its limitations. This is particularly in evaluating predictive accuracy and handling non-linear or biased models

Mean Squared Error

MSE is particularly useful when we want to penalize larger errors more heavily. Unlike MAE, MSE squares the error terms, meaning large deviations between predicted and actual values contribute disproportionately to the final score. This makes MSE sensitive to outliers, so it is often preferred in contexts where large errors are undesirable.

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

👉 A detailed explanation of MSE can be found in the section: MSE

Root Mean Squared Error

RMSE is similar to MSE but returns the error in the same units as the original data, making it easier to interpret. Like MSE, it gives more weight to larger errors due to the squaring, and is commonly used when we need a metric that emphasizes large errors but also requires a result with the same scale as the target variable.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

👉 A detailed explanation of RMSE can be found in the section: RMSE

Mean Absolute Error

MAE calculates the average of absolute differences between predicted and actual values.

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

👉 A detailed explanation of MAE can be found in the section: MAE

Adjusted R-squared

Adjusted R-squared modifies \(R^2\) to account for the number of predictors in the model.

$$\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)$$
  • \(n\) is the number of observations.
  • \(p\) is the number of predictors.

When comparing models with different numbers of features, we use adjusted \(R^2\).

Advantages and Disadvantages

The R-squared metric, or the coefficient of determination, is widely used in regression analysis to measure how well a model explains the variability of the target variable. While it offers valuable insights, especially for linear models, it also comes with notable limitations.

Advantages:

  • R-squared provides a clear percentage-based score that indicates how much of the variance in the dependent variable is explained by the model.
  • When comparing multiple models on the same dataset, a higher \(R^2\) generally suggests a better fit (though not necessarily better predictive performance).
  • R-squared is a core metric in classical linear regression and is well-supported in most statistical software and modeling frameworks.
  • Since \(R^2\) is a ratio, it’s unitless and normalized between 0 and 1 (though it can go negative if the model is very poor), making it easy to compare across datasets or scales.

Disadvantages:

  • A high \(R^2\) does not mean the model will perform well on unseen data. It only reflects how well the model fits the current data, not its generalizability.
  • R-squared doesn’t detect systematic bias in predictions. A model can have high \(R^2\) while consistently under- or over-predicting.
  • \(R^2\) is most meaningful in linear regression. In non-linear or tree-based models, it may not offer a reliable view of model quality.
  • Adding more features (even irrelevant ones) can artificially increase \(R^2\). This is why adjusted \(R^2\) is often preferred when comparing models with different numbers of predictors.
  • Although often described as ranging from 0 to 1, \(R^2\) can be negative if the model fits worse than simply predicting the mean every time. This can be confusing for new practitioners.

Conclusion

R-squared, or the coefficient of determination, is a foundational metric in regression analysis that quantifies how well a model explains the variability in the target variable. It’s simple to understand, widely supported, and especially useful in the context of linear models for evaluating model fit and variance explanation.

However, while R-squared offers valuable insight into how well your model fits the training data, it should not be mistaken for a measure of predictive accuracy. It can be inflated by overfitting, misinterpreted in non-linear contexts, and fails to capture bias or the actual magnitude of errors. In practice, it’s best used alongside complementary metrics like RMSE, MAE, or Adjusted R-squared to make informed decisions about model quality.

Feedback

Found this helpful? Let me know what you think or suggest improvements 👉 Contact me.