Differencing and Testing for Stationarity

The previous post established what stationarity means and how to recognise when a series lacks it. The airline passenger series fails on both counts: a clear upward trend and a repeating seasonal cycle, confirmed by an ACF that barely drops across 40 lags.

This post is about what to do about it. The main tool is differencing, subtracting lagged values to strip out structure the model can’t handle directly. Three variants are covered here: first-order differencing, second-order differencing, and seasonal differencing. After that, two statistical tests let you check the work objectively: the KPSS test to assess whether differencing was enough, and the Ljung-Box test to verify that what remains looks like white noise.

First-Order Differencing

The first difference of a series is simply the change from one observation to the next:

y′ₜ = yₜ − yₜ₋₁

A series of T observations produces T − 1 differences, since there is no prior value to subtract from the first one. The differenced series measures how much the original series moved at each step, rather than where it was.

Applying this to the airline series removes the trend. The drifting upward level is replaced by month-to-month changes that fluctuate around zero. However, as the figure below shows, first differencing is not the full story for this particular series.

Figure 1. The original airline series (top) trends upward throughout. First differencing (bottom) removes the trend — the changes now oscillate around zero — but the seasonal pattern is still clearly visible in the oscillations.

The seasonal cycle survives first differencing because it repeats at a fixed lag (12 months), not from one observation to the next. Subtracting yₜ₋₁ has no effect on a pattern operating at lag 12. That is why the differenced series in Figure 1 still shows a regular annual wave.

The Random Walk Connection

When the first-differenced series is white noise, the original series follows a specific model. If y′ₜ = εₜ with εₜ drawn from white noise, then:

yₜ = yₜ₋₁ + εₜ

This is the random walk. Each observation is the previous one, shifted by an unpredictable amount. The series has no tendency to return to any particular level. It drifts wherever the noise takes it.

Adding a constant c to the differences gives the random walk with drift:

yₜ = c + yₜ₋₁ + εₜ

Here, c is the average change per period. When c is positive, the series tends to climb; when negative, it tends to fall. This model underpins the drift forecast introduced early in the series. Crucially, the future direction is still uncertain; only the average trend is determined.

Second-Order Differencing

Occasionally, one round of differencing does not produce a stationary result. In that case a second difference can be applied; the difference of the differences:

y″ₜ = y′ₜ − y′ₜ₋₁ = yₜ − 2yₜ₋₁ + yₜ₋₂

This is the ‘change in the changes.’ A series of T observations now yields T − 2 values. In practice, second-order differencing is rarely needed. Most real-world series become stationary after one round, and over-differencing introduces problems of its own, artificial oscillations that weren’t there originally. The rule of thumb is to use as few differences as necessary.

Seasonal Differencing

A seasonal difference subtracts the observation from the same season in the previous cycle:

y′ₜ = yₜ − yₜ₋ₘ

where m is the number of seasons. For monthly data m = 12; for quarterly data m = 4. This operation compares each January to the previous January, each July to the previous July, and so on. The seasonal pattern, which repeats at lag m, is directly cancelled out.

If the seasonally differenced series is already white noise, the underlying model is:

yₜ = yₜ₋ₘ + εₜ

This is the seasonal naïve forecast; each observation is predicted using the same period from the previous year. In practice, this is rarely sufficient on its own. For a series like airline data, where both trend and seasonality are present, seasonal differencing removes the cycle, but the trend remains visible. A further first difference is then needed.

The order matters slightly for readability but not for the final result. Applying seasonal differencing first is recommended: if that alone achieves stationarity, no further differencing is needed, and you avoid doing unnecessary work.

Figure 2. The airline series at each differencing stage. Seasonal differencing (middle panel) removes the annual cycle but leaves a residual upward trend. Adding a first difference (bottom panel) eliminates the trend and produces a series that fluctuates without obvious structure.

Reading the ACF at Each Stage

The ACF is the most informative guide to whether a given differencing step worked. Each stage of the airline differencing produces a distinct ACF pattern.

The original series (top-left panel in Figure 3) shows the slow decay characteristic of non-stationarity; almost every lag is significant. After first differencing (top-right), the trend-driven decay disappears, but spikes at lags 12 and 24 remain, confirming the seasonal structure is intact. Seasonal differencing alone (bottom-left) removes those spikes but leaves a slowly decaying pattern from the unresolved trend. The bottom-right panel, after both seasonal and first differencing, is the closest to white noise: most bars fall inside the confidence bands, with no clear periodicity.

Figure 3. ACF at each differencing stage. The original series (top-left) decays slowly. First differencing (top-right) removes the trend decay but leaves seasonal spikes at lags 12 and 24. Seasonal differencing alone (bottom-left) removes seasonality but not the trend. Only after both (bottom-right) does the ACF resemble white noise.

The KPSS Test

Visual inspection is useful but subjective. The KPSS test (Kwiatkowski-Phillips-Schmidt-Shin) gives a formal statistical answer. The null hypothesis is that the series is stationary. A small p-value, below 0.05, is evidence against that null, meaning differencing is needed.

This framing is the opposite of most hypothesis tests: here, you want to fail to reject, not to reject. A p-value above 0.05 means the data are consistent with stationarity.

The Practical Workflow

The KPSS test is used iteratively. Apply it to the original series; if the null is rejected, difference once and re-test; repeat until the null is no longer rejected. This process can be automated. Python’s statsmodels includes routines that determine the required number of differences in a single call.

For the airline series, the workflow runs as follows. The raw series returns a very large test statistic and a p-value well below 0.01, stationarity is rejected immediately. After seasonal and first differencing, the test statistic drops sharply, and the p-value climbs above 0.05, indicating stationarity can no longer be rejected. That is the target state.

SeriesKPSS statp-valueVerdict
Original yₜ≫ 0.739 (1%)< 0.01Not stationary ✗
First diff y′ₜReduced< 0.05Still not stationary ✗
Seasonal diffReduced< 0.05Still not stationary ✗
Seasonal + First y″ₜ≈ 0.1–0.3> 0.05Stationary ✓

The Ljung-Box Test

Once the series has been differenced to stationarity, the Ljung-Box test checks whether the result is actually uncorrelated. That is, whether it behaves like white noise. The null hypothesis here is that the first h autocorrelations are all zero. A small p-value means there is still structure in the series; a large one means the residuals are consistent with white noise.

The test statistic Q* is computed from the first h sample autocorrelations:

Q* = n(n+2) · Σₖ₌₁ʰ  r²(k) / (n − k)

where n is the number of observations and r(k) is the sample autocorrelation at lag k. Under the null hypothesis, Q* follows a chi-squared distribution with h degrees of freedom. Larger values of Q* indicate stronger autocorrelation.

The choice of h matters somewhat. A common convention is h = 10 for non-seasonal data and h = 2m for seasonal data (so h = 24 for monthly series). Using too few lags may miss structure at higher lags; using too many dilutes the power of the test.

Applied to the Airline Series

Figure 4 shows both Q* and the corresponding p-value for each differencing stage, computed at six values of h. The original series produces enormous Q* values across the board the autocorrelation is overwhelming. First differencing and seasonal differencing each reduce Q*, but neither alone brings p-values consistently above 0.05. After seasonal plus first differencing, Q* drops sharply and p-values climb above 0.05 at h = 10, the most common reference point.

However, at larger lags (h = 15, 20, 25) the p-values dip back below 0.05, suggesting residual autocorrelation that differencing alone cannot eliminate. This is not a surprise: the airline series has a strong multiplicative structure, and the doubly differenced series still retains some of it. In practice, this is precisely why SARIMA adds AR and MA terms on top of the differencing; to capture whatever autocorrelation remains after the differencing step.

Figure 4. Ljung-Box Q* (left, log scale) and p-values (right) for each differencing stage at lags h = 5 to 30. The original series dominates. After seasonal + first differencing (purple), Q* drops substantially and p-values cross above 0.05 at h = 10, though some structure persists at larger lags.

Differencing vs Decomposition

It’s worth stepping back to see how differencing relates to the decomposition methods covered earlier in this series. Both decomposition and differencing aim to separate predictable structure from random noise. However, they go about it differently.

Decomposition explicitly estimates each component: trend, seasonal, remainder and names them. The result is interpretable: you can say the trend in 1955 was this value, the seasonal peak in August was that amplitude. Differencing, meanwhile, removes structure without estimating it. The trend disappears into the differences without ever being quantified. That makes the differenced series harder to interpret directly, but the approach is more flexible: no functional form needs to be assumed for the trend, and no decision needs to be made about whether the seasonality is additive or multiplicative.

In practice, the two approaches often work together. SARIMA implicitly differences the series before fitting its AR and MA terms. Decomposition-based forecasting uses the estimated components directly. Which is more appropriate depends on whether interpretability or flexibility matters more for the problem at hand.

The Short Version

First differencing removes the trend by converting levels to changes. Seasonal differencing removes the seasonal cycle by subtracting the observation from the same season one period back. For a series with both problems, the standard practice is to apply seasonal differencing first, then first differencing if needed.

The KPSS test checks whether the result is stationary; the null is stationarity, so you want a high p-value. The Ljung-Box test checks whether the residuals are uncorrelated; the null is white noise, so again you want a high p-value. Together, these two tests form the confirmation step that differencing has done what it was supposed to do.

For the airline series, D = 1 and d = 1 (seasonal plus one first difference) clears both tests at the standard reference lags. Some residual autocorrelation at longer lags remains, which is exactly the job of the AR and MA terms in SARIMA to handle.

Leave a Reply

Create a website or blog at WordPress.com

Up ↑

Discover more from Writing my way through ideas.

Subscribe now to keep reading and get access to the full archive.

Continue reading