ARIMA

Machine Learning → Supervised Learning → Regression → Time Series Forecasting

The ARIMA (AutoRegressive Integrated Moving Average) algorithm is one of the most widely used and well-established methods for time series forecasting. It is a supervised machine learning technique specifically designed for datasets where observations are collected over regular time intervals. Unlike traditional regression models that assume data points are independent, ARIMA is designed to capture temporal dependence, meaning that past values in the series have a direct influence on future values.

ARIMA is especially suitable for univariate time series forecasting, meaning it predicts future values based solely on past observations of the same variable, without relying on external predictors. It is particularly useful when the time series shows patterns such as trends or seasonal cycles that can be captured through statistical modeling.

How It Works

ARIMA is designed to forecast future values in a time series based on its past values. The process of using ARIMA can be broken down into three main stages: identifying the model, fitting the model, and making predictions.

Identifying the Model

ARIMA models consist of three key components: AutoRegressive (AR), Integrated (I), and Moving Average (MA). To choose the right model, we need to understand the characteristics of the time series data.

This involves checking for:

Trend: Does the data show a general upward or downward pattern over time?
Seasonality: Does the data fluctuate in a regular pattern at fixed intervals (e.g., monthly, yearly)?
Stationarity: Does the data’s statistical properties (like mean and variance) remain constant over time?

For example, imagine you're forecasting monthly sales data for a retail store. You notice a trend where sales have been steadily increasing each year. You also see that sales fluctuate significantly during the holiday season. In this case, you would need to consider both the trend (increasing sales) and seasonality (holiday spikes) when fitting your ARIMA model.

To make the series stationary (i.e., remove the trend), we often use the Differencing step (the "I" component of ARIMA). This step involves subtracting the previous value from the current value to eliminate any long-term trends, leaving only the short-term fluctuations.

Fitting the Model

Once the data is prepared, ARIMA tries to fit the best combination of its three components (AR, I, MA) to your data. This involves:

AR (AutoRegressive) part: The model looks at the relationship between a current value and its previous values. For example, if sales in the previous month were high, the model might predict higher sales for the next month based on that relationship.

I (Integrated) part: If the series has a trend, ARIMA uses differencing to remove it. For example, if sales are consistently increasing, differencing removes the overall upward trend, making the data stationary.

MA (Moving Average) part: The model considers the relationship between a current value and the residual errors from previous predictions. If the model's previous predictions were consistently off by a certain amount, it might adjust future predictions based on that error.

MA (Moving Average) part: The model considers the relationship between the current value and past forecast errors. For example, if the model predicted lower sales than actually occurred in a given month, the MA component will adjust future predictions to correct for this error.

Making Predictions

Once the ARIMA model is trained, it can forecast future values. The model generates predictions by combining the historical values (AR part), the errors in previous predictions (MA part), and the trend (after differencing) to make an informed guess about future values.

Example

Let’s say you have the following sales data for the past 6 months:

Month	Sales
Jan	200
Feb	220
Mar	250
Apr	270
May	300
Jun	320

We want to predict sales for the next month (July).

AutoRegressive Component (AR): Looks at how sales in previous months (e.g., June and May) relate to the current month's sales.

Integrated Component (I): If there's a long-term upward trend in sales, differencing will adjust the data to make it stationary.

Moving Average Component (MA): Considers the forecast errors from previous months to correct any inaccuracies in the prediction.

By combining these components, the ARIMA model would generate a forecast for July based on patterns it identified in the data.

Mathematical Foundation

The ARIMA model is based on a combination of three key components: AutoRegressive (AR), Integrated (I), and Moving Average (MA). Each of these components has a mathematical representation, and understanding these will give us insights into how ARIMA forecasts future data points.

AR (AutoRegressive) Component

The AutoRegressive (AR) component models the relationship between the current value of the series and its previous values.

The equation for the AR model is:

$$Dy_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \epsilon_t$$

$y_t$ is the value of the time series at time $t$.
$\phi_1, \phi_2, \dots, \phi_p$ are the parameters that represent the influence of previous values on the current value.
$p$ is the order of the AR model, i.e., how many past values the model considers.
$\epsilon_t$ is the error term (residual or noise), assumed to be normally distributed with mean zero.

The AR model assumes that past values influence the current value, so we predict $y_t$ by linearly combining previous values.

I (Integrated) Component

The Integrated (I) component is used to make a time series stationary. A stationary series has constant mean, variance, and autocorrelation over time, which is a requirement for many time series models, including ARIMA.

Differencing is the most common method for achieving stationarity. It involves subtracting the previous value from the current value, as shown in the equation:

$$y'_t = y_t - y_{t-1}$$

The result of differencing is a transformed series $y'_t$ that is stationary if the original series had a trend.

In general, if a series is not stationary after the first differencing, we apply second or higher differencing (i.e., subtracting the value from two periods ago, and so on).

MA (Moving Average) Component

The Moving Average (MA) component models the relationship between the current value and past forecast errors. This part helps adjust predictions based on the error (or noise) of previous forecasts.

The equation for the MA model is:

$$y_t = \mu + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} + \epsilon_t$$

$\mu$ is the mean of the series (which is assumed to be constant).
$\epsilon_{t-1}, \epsilon_{t-2}, \dots, \epsilon_{t-q}$ are past forecast errors.
$\theta_1, \theta_2, \dots, \theta_q$ are the coefficients of the moving average model.
$q$ is the order of the MA model, i.e., how many past forecast errors the model considers.

The MA model helps reduce the impact of random fluctuations in the data, making it more stable and accurate.

Combining AR, I, and MA

The general ARIMA model is a combination of these three components (AR, I, MA) and is represented by the following equation:

$$\Delta^d y_t = \phi_1 \Delta^d y_{t-1} + \phi_2 \Delta^d y_{t-2} + \dots + \phi_p \Delta^d y_{t-p} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} + \epsilon_t$$

$\Delta^d y_t$ represents the differenced series, applied $d$ times to make the series stationary (the "I" component).
$p$ is the order of the AR component.
$q$ is the order of the MA component.

AR(p): Refers to the number of past values the model uses (i.e., the order of the autoregressive part).

I(d): Refers to the number of differencing steps applied to make the time series stationary.

MA(q): Refers to the number of past forecast errors the model uses (i.e., the order of the moving average part).

Estimating Parameters

The parameters $\phi_1, \phi_2, \dots, \phi_p$ (AR), $\theta_1, \theta_2, \dots, \theta_q$ (MA), and $d$ (for differencing) are typically estimated using methods such as Maximum Likelihood Estimation (MLE) or Least Squares. In practice, these parameters are determined through a process called model selection, often guided by techniques like AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion), which help to balance model accuracy and complexity.

Code Example

We'll implement a basic ARIMA model in Rust to forecast a simple time series. For simplicity, we'll focus on the ARIMA(1,1,1) model, which means the model uses 1 lag of the autoregressive term, 1 differencing operation, and 1 lag of the moving average term. We'll use a small dataset of monthly sales to demonstrate the process.

We have a dataset of sales over the past six months:

Month	Sales
Jan	200
Feb	220
Mar	250
Apr	270
May	300
Jun	320

We'll use ARIMA to predict the sales for the 7th month (July).

Dependencies

To run this code, make sure you add the following dependencies in your Cargo.toml:

[dependencies]
ndarray = "0.15"
csv = "1.1"

Code

use ndarray::Array1;
use std::error::Error;

// Helper function to calculate the first-order differencing
fn difference(data: &Array1<f64>) -> Array1<f64> {
    let mut diff = Array1::zeros(data.len() - 1);
    for i in 1..data.len() {
        diff[i - 1] = data[i] - data[i - 1];
    }
    diff
}

// ARIMA model (1,1,1)
fn arima_model(data: &Array1<f64>) -> f64 {
    // Step 1: Differencing (I component)
    let differenced_data = difference(data);
    
    // Step 2: AR(1) – Using the previous value to predict the next (auto-regression)
    let ar_term = differenced_data[differenced_data.len() - 2];  // AR(1) uses the last differenced value
    
    // Step 3: MA(1) – Using the error from the previous prediction (simplified here)
    let error_term = differenced_data[differenced_data.len() - 1] - ar_term;

    // Combine AR and MA terms for prediction (simplified, no constant term for simplicity)
    let prediction = ar_term + error_term;

    // Step 4: Reverse differencing to return to the original scale
    let last_value = data[data.len() - 1];
    prediction + last_value
}

fn main() -> Result<(), Box<dyn Error>> {
    // Example data: Monthly sales for 6 months
    let sales_data = Array1::from(vec![200.0, 220.0, 250.0, 270.0, 300.0, 320.0]);

    // Forecast the next value (7th month)
    let prediction = arima_model(&sales_data);

    println!("Predicted sales for the 7th month: {:.2}", prediction);

    Ok(())
}

Data Input:

We define the monthly sales data as an Array1<f64>, which is a 1D array in Rust for storing numeric data.

Differencing (I component):

The difference function computes the first-order differencing to make the series stationary. In the ARIMA model, differencing helps remove trends and make the time series more predictable.

AR (AutoRegressive) Component:

The AR term is calculated by using the previous value in the differenced series. Here, we use AR(1), meaning only the immediate past value influences the prediction.

MA (Moving Average) Component:

The MA term uses the error (or residual) from the last prediction to adjust the current forecast. In this simplified example, we're assuming the error is the difference between the most recent differenced value and the AR prediction.

Prediction and Reverse Differencing:

After combining the AR and MA terms, we reverse the differencing process to return the forecast to its original scale (the scale of the actual sales data).

Main Function:

The main function initializes the sales data, passes it through the ARIMA model, and prints out the predicted sales for the 7th month.

Output

Running the above code will output the predicted sales for the 7th month, based on the ARIMA(1,1,1) model:

Predicted sales for the 7th month: 340.00

In practice, selecting the right values for the AR, I, and MA terms is crucial for good forecasting. This can be done using grid search or statistical tests.

Model Evaluation

It is always important to verify how accurate our model is. This helps determine whether linear regression is even appropriate. If the error is large, the relationship might be nonlinear, or the data may be too scattered. It can also help identify problematic data points or outliers.

To evaluate the model, we use various error metrics that quantify the difference between the actual values and the model’s predictions:

MAE (Mean Absolute Error)

Mean Absolute Error (MAE) is a metric used to evaluate regression models by measuring the average absolute difference between actual and predicted values. MAE treats all errors equally, making it more robust to outliers and noise in the data.

It represents the average absolute difference between the actual and predicted values:

$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

👉 A detailed explanation of MAE can be found in the section: Mean Absolute Error

MSE (Mean Squared Error)

MSE penalizes larger errors more than smaller ones (because the error is squared).

It represents the average squared difference between the actual and predicted values:

$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

👉 A detailed explanation of MSE can be found in the section: Mean Squared Error

RMSE (Root Mean Squared Error)

RMSE has the same units as the original values, making it more intuitive to interpret.

It represents the average squared difference between the actual and predicted values:

$$ \text{RMSE} = \sqrt{\text{MSE}} $$

👉 A detailed explanation of RMSE can be found in the section: Root Mean Squared Error

Mean Absolute Percentage Error (MAPE)

The average percentage error between predicted and actual values. However, it's undefined if any actual values are zero and can be distorted by very small values.

$$\text{MAPE} = \frac{100\%}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right|$$

Akaike Information Criterion (AIC)

Balances model fit and complexity. A lower AIC indicates a better model. Used primarily for model selection, not for direct error measurement. Helps compare ARIMA models with different (p, d, q) combinations.

$$\text{AIC} = 2k - 2\ln(L)$$

$k$ is the number of model parameters.
$L$ is the likelihood of the model given the data.

Alternative Algorithms

ARIMA is a powerful and widely used algorithm for time series forecasting. Several alternative algorithms can be considered depending on the structure and requirements of the data.

SARIMA (Seasonal ARIMA): An extension of ARIMA that explicitly models seasonality. More flexible for seasonal data. Adds seasonal AR, I, and MA components (P, D, Q, S).
Exponential Smoothing (ETS): A family of models that forecasts future values by weighting past observations exponentially. ETS is simpler, often better for shorter-term forecasts with strong trends or seasonality.
LSTM (Long Short-Term Memory networks): A type of recurrent neural network (RNN) designed to model sequential data. Requires much more data and computation, captures complex temporal dependencies better.
XGBoost / LightGBM for Time Series: Gradient boosting frameworks applied to time series with feature engineering (lags, rolling averages, etc.). Not built specifically for time series, but often performs better with rich data and proper preprocessing.

Advantages and Disadvantages

✅ Advantages:

ARIMA works very well when forecasting a single variable over time, especially when the data is linear and exhibits clear trends.
Based on solid statistical theory, ARIMA models offer transparent and interpretable results, making them suitable for critical applications like finance and policy forecasting.
The AR and MA components are specifically designed to model temporal dependencies and trends in the data.
Extensions like SARIMA can model seasonality, and ARIMAX can include exogenous variables, making ARIMA adaptable to more complex cases.

❌ Disadvantages:

The algorithm assumes that the data is stationary (constant mean and variance), so additional preprocessing like differencing or transformation is often needed.
ARIMA is a linear model, meaning it struggles to capture non-linear patterns that may exist in real-world time series.
Standard ARIMA only handles univariate series. If you have multiple influencing factors, you must use extensions like ARIMAX or switch to machine learning models.
Choosing the right parameters (p, d, q) requires expertise or model selection criteria (like AIC), and the process can be time-consuming without automation.
ARIMA relies on historical data patterns. If there's a sudden shift in behavior (like a pandemic or market crash), its predictions may become unreliable.

Quick Recommendations

Criterion	Recommendation
Dataset Size	🟡 Medium
Training Complexity	🟡 Medium

Use Case Examples

Stock Price Forecasting (Finance Sector)

Predicting the closing prices of stocks or indices based on past price movements. Stock prices often follow short-term linear trends, which ARIMA can model effectively. It’s especially useful for short-term forecasting of price movements or volatility.

Electricity Demand Forecasting (Energy Sector)

Estimating electricity usage over the coming hours or days for load balancing and power grid management. Time series data like hourly electricity consumption often has strong autocorrelation, and ARIMA can handle short-term forecasting accurately when the data is stationary.

Retail Sales Prediction (Business & Commerce)

Forecasting monthly or weekly product sales to manage inventory and supply chains. Many retail datasets are univariate (e.g., sales of a single item) and show trends over time, making ARIMA ideal for planning future stock levels and reducing overstock.

Weather and Climate Forecasting (Environmental Science)

Predicting temperature or rainfall over time in a specific location. Useful for modeling medium-term forecasts where data shows trends but not complex interactions. Can serve as a baseline or complementary model to more complex systems.

Disease Incidence Tracking (Public Health)

Monitoring the number of disease cases (e.g., influenza, COVID-19) over time to forecast future outbreaks. Can be used to model the time-dependent patterns of case counts and help public health agencies plan for upcoming spikes or declines in disease incidence.

Conclusion

The ARIMA (AutoRegressive Integrated Moving Average) algorithm is a powerful and time-tested approach for forecasting univariate time series data. It is particularly well-suited to problems where past values and past errors can explain future behavior, and where the underlying process follows a linear trend over time.

Ultimately, ARIMA remains one of the most reliable algorithms in the time series forecasting toolbox. It offers a solid foundation, especially when used as a benchmark or combined with more advanced models.

External resources:

Example code in Rust available on 👉 GitHub Repository

Feedback & Sharing

Give us your thoughts on this page, or share it with others who may find it useful.

Feedback

Found this helpful? Let me know what you think or suggest improvements 👉 Contact me.