Moving Averages: How Smoothing Reveals the Trend

When we look at a time series, the raw data often hides the underlying pattern. Short-term fluctuations make it hard to see whether the series is actually rising, falling, or simply noisy.

A moving average is one of the simplest tools we have to reveal that structure. By averaging nearby observations, it smooths out random variation and makes the long-run movement of the series easier to see.

If you’ve seen moving averages introduced as a simple forecasting tool, trailing averages, weighted averages, window size trade-offs, this earlier post covers that angle. Here the focus shifts: we’re using moving averages not to forecast directly, but to extract the trend-cycle component from a series before decomposition.


The Basic Idea

A moving average of order m is written:

T^t=1mโˆ‘j=โˆ’kkyt+j\hat{T}t = \frac{1}{m} \sum{j=-k}^{k} y_{t+j}

where m = 2k + 1. At each time point t, you average the k observations before it, the observation itself, and the k observations after it. The result T^t\hat{T}_t is the estimated trend-cycle at that point.

For example, with a 5-MA (m=5, k=2), the smoothed value at time t is the average of ytโˆ’2,ytโˆ’1,yt,yt+1y_{t-2}, y_{t-1}, y_t, y_{t+1}, yt+2y_{t+2}.

The intuition is simple: nearby observations in time tend to have similar values. Averaging them cancels out short-term fluctuations, leaving the slower-moving trend. The larger the window, the more aggressively noise gets suppressed, but also the more the curve lags, and the more end-points get lost (you need k observations on each side to compute the average).

Chart showing moving averages smoothing Australian export data over decades.
Figure 1. Australian exports as a percentage of GDP, with 3-MA, 5-MA, and 9-MA overlaid. Larger windows produce smoother curves but lose more observations at the ends and are slower to respond to genuine changes in direction.

Order Matters: Odd vs Even

Simple moving averages are almost always of odd order; 3, 5, 7, 9. The reason is symmetry. With an odd window of size m = 2k+1, the central observation sits exactly in the middle, with k points on each side. The average is centred on t.

If you used an even window, say m = 4, there’s no natural centre. A 4-MA over ytโˆ’1,yt,yt+1,yt+2y_{t-1}, y_t, y_{t+1}, y_{t+2} is centred between t and t+1, not on either of them. The result is a smoothed value that doesn’t align cleanly with the original time axis. That asymmetry causes problems when you try to subtract the trend from the original data.


Moving Averages of Moving Averages

What if your seasonal period is even like quarterly data (period 4) or monthly data (period 12)?

The solution is to apply two moving averages in sequence. Take a 4-MA first, then apply a 2-MA to the result. This is called a 2ร—4-MA.

Why does this work?

T^t=18ytโˆ’2+14ytโˆ’1+14yt+14yt+1+18yt+2\hat{T}t = \frac{1}{8}y{t-2} + \frac{1}{4}y_{t-1} + \frac{1}{4}y_t + \frac{1}{4}y_{t+1} + \frac{1}{8}y_{t+2}

The result is a weighted average, symmetric around t, where the central observations carry more weight than the edge ones. Symmetry is restored, and the estimate is properly centred.

The general rule: to make a centred moving average out of an even-order MA, follow it with a 2-MA. The combination is written 2ร—m2 \times m-MA.

Comparing 4-MA and 2x4-MA smoothing techniques in beer production trends.
Figure 2. Quarterly beer production data with a 4-MA and a 2ร—4-MA. The raw 4-MA (dashed) is offset by half a period, it falls between time points. The 2ร—4-MA (solid) is symmetric and properly aligned with the original series.

Why Seasonal Period Determines Window Size

When you compute a moving average to estimate the trend, you want to average out the seasonality completely. If your window covers exactly one full seasonal cycle, and all seasons appear equally, the seasonal effects cancel and what remains is pure trend.

For monthly data with annual seasonality (period 12), a 2ร—12-MA achieves this. Each month of the year gets equal weight: the first and last terms (which are the same month in adjacent years) each get weight 124\frac{1}{24}, and every other month gets 112\frac{1}{12}. Over a full year, all seasonal variation is averaged out.

For quarterly data (period 4), a 2ร—4-MA does the same thing, each quarter gets equal weight across the window.

If the window does not match the seasonal cycle, the average will mix seasons unevenly. For example, an 11-month moving average on monthly data will sometimes include two Decembers and sometimes none. The seasonal effects will therefore not cancel, and the smoothed series will still contain seasonal patterns.

The rule: use a 2ร—m2 \times m-MA where mm is the seasonal period (for even periods). For odd periods, a simple mm-MA works directly.

Line graph showing moving averages smoothing data trends over time.
Figure 3. Monthly retail employment with a 2ร—12-MA (orange). The smoothed line contains no seasonal variation โ€” it captures only the long-run trend-cycle. Compare the raw series (grey), which oscillates every December.


Weighted Moving Averages

The 2ร—42 \times 4-MA above is a special case of a weighted moving average:

T^t=โˆ‘j=โˆ’kkaj,yt+j\hat{T}t = \sum{j=-k}^{k} a_j , y_{t+j}

where the weights aja_j are not all equal. Two conditions must hold: the weights must sum to 1 (so the average is on the right scale), and they must be symmetric (aj=aโˆ’ja_j = a_{-j}, so the estimate is centred).

The simple m-MA is the special case where all weights are 1m\frac{1}{m}. The 2ร—42 \times 4-MA uses weights [18,14,14,14,18][\frac{1}{8}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{8}].

The advantage of unequal weights is smoothness. In a simple moving average, observations enter and leave the window abruptly. A weighted moving average softens this transition by gradually reducing the importance of observations as they move away from the centre.


What Moving Averages Can’t Do

A moving average is a powerful smoother, but it has real limitations:

End-point problem: You lose k observations at each end. For a 12-MA, that’s six months at the start and six at the end โ€” potentially a year of trend estimates gone. This is why more sophisticated methods (like STL) are preferred for real applications.

Lag: A moving average reacts slowly to genuine changes in direction. If the series takes a sharp turn, the MA will still be averaging across the old trend for several periods. It tells you where the series was, not necessarily where it is.

Fixed window: A simple MA treats all observations in the window equally regardless of how far they are from the centre. More sophisticated methods give more weight to closer observations.

Moving averages are the foundation of classical decomposition, which we cover in the next post. Understanding them here means the mechanics of decomposition will make immediate sense โ€” because classical decomposition is essentially: extract trend with a moving average, then work out seasonality from what’s left.


Hyndman, R.J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice, 3rd ed., Chapter 3.1 OTexts. https://otexts.com/fpp3/

Leave a Reply

Create a website or blog at WordPress.com

Up ↑

Discover more from Writing my way through ideas.

Subscribe now to keep reading and get access to the full archive.

Continue reading