September 13, 2021

Autoregressive Moving Average models The generalised form of an \(AR\) model of order \(p\) is given by Box and Jenkins

\[x_{t}=c+\rho_{1} y_{t-1}+\rho_{2} y_{t-2}+\cdots+\rho_{p} y_{t-p}+\epsilon_{t} \quad (1)\]

where \(c\) is a constant, \(\rho_{1}, ..., \rho_{p}\) are the parameters (AR coefficients) of the model, \(y_{t−1}, . . . , y_{t−p}\) are the time-lagged values of the series \(y_{t}\) , and \(\epsilon_{t}\) is the error term at time \(t\) with mean zero and constant variance \(σ^{2}_{\epsilon}\) . The notation \(p\) in \(AR_p\) indicates the order of the autoregressive process.

Moving Average (\(MA)\) regresses against the past errors of the series, it is a type of stochastic process. Its generalised form of order \(q\) is given by:

\[\begin{array}{l} x_{t}=c+\epsilon_{t}+\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2}+\ldots+\theta_{q} \epsilon_{t-q} \quad (2) \end{array}\]

where \(\theta_{1}, \ldots, \theta_{q}\) are the parameters (MA coefficients) of the model, and \(\epsilon_{t-1}, \ldots, \epsilon_{t-q}\) are the time-lagged values of the error. Similar to the AR model, the term order \(q\) refers to the highest order power in the polynomial.

Let \(L\) operate on \(y_{t}\), and \(\epsilon_{t}\) as the Lag operator\(\begin{array}{l} L^{k} x_{t}=x_{t-k}, \quad \forall k \in \mathbb{Z} \end{array}\)

The Differencing Operator \(\nabla\) is defined as:

\[\begin{aligned}{l} \nabla x_{t} &=x_{t}-x_{t-1} \\ &=(1-L) x_{t} \end{aligned}\]

With the lagging notation, expressions \((1)\) and \((2)\) become:

\[\begin{array}{l} x_{t}=c+(\rho_{1} L+\rho_{2} L^{2}+\ldots+\rho_{p} L^{p}) y_{t}+\epsilon_{t} \quad (3)\\[0.2cm] x_{t}=c+(1+\theta_{1} L+\theta_{2} L^{2}+\ldots+\theta_{q} L^{q}) \epsilon_{t} \quad (4) \end{array}\]

Setting the \(AR\) polynomial of \(L\) of order \(p\) as:

\[P_{p}(L)=1-\rho_{1} L-\rho_{2} L^{2}-\ldots-\rho_{p} L^{p} \quad (5)\]

and the \(MA\) polynomial of \(L\) of order \(q\) as:

\[\Theta_{q}(L)=1+\theta_{1} L+\theta_{2} L^{2}+\ldots+\theta_{q} L^{q} \quad (6)\]

then, from expressions \((5)\) and \((6)\), expressions \((3)\) and \((4)\) become:

\[\begin{array}{l} P_{p}(L) y_{t}=c+\epsilon_{t}\\[0.2cm] x_{t}=c+\Theta_{q}(L) \epsilon_{t} \end{array}\]

When these two models are coupled together they produce an \(ARMA\) model with an order \((p, q)\) written as:

\[\begin{array}{l} x_{t}=c+\rho_{1} x_{t-1}+\rho_{2} x_{t-2}+\ldots+\rho_{p} x_{t-p}+\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2}+\ldots+\theta_{q} \epsilon_{t-q}+\epsilon_{t} \quad (7) \end{array}\]

With the lagging and polynomial notations, expression \((7)\) becomes:

\[\begin{array}{l} \Phi_{p}(L) y_{t}=\Theta_{q}(L) \epsilon_{t} \end{array}\] \[x_{t}=\rho_{1} x_{t-1}+\theta_{1} \epsilon_{t-1}+\epsilon_{t}\]


\(p = 0 => MA(q)\) \(q = 0 => AR(p)\)

Any causal, invertible linear process has:

Real data cannot be exactly modelled using a finite number of parameters. We choose \(p, q\) to give a simple but accurate model.

Parameter Estimation


  1. The model order (\(p\) and \(q\)) is known
  2. The data has zero mean

Assume that \({X_t}\) is Gaussian, that is, \(\Phi_{p}(L) X_{t}=\Theta_{q}(L) \epsilon_{t}\), where \(\epsilon_t\) is \(i.i.d.\) Gaussian.

Choose \(\phi_{i}, \theta_{j}\) to maximize the likelihood: \(L\left(\phi, \theta, \sigma^{2}\right)=f_{\phi, \theta, \sigma^{2}}\left(X_{1}, \ldots, X_{n}\right)\)

where \(f_{\phi, \theta, \sigma^{2}}\) is the joint (Gaussian) density for the given ARMA model

Maximum likelihood estimation

Suppose that \(X_{1}, X_{2}, \ldots, X_{n}\) is drawn from a zero mean Gaussian ARMA \((p, q)\) process. The likelihood of parameters \(\phi \in \mathbb{R}^{p}, \theta \in \mathbb{R}^{q}, \sigma_{w}^{2} \in \mathbb{R}_{+}\) is defined as the density of \(X=\left(X_{1}, X_{2}, \ldots, X_{n}\right)^{\prime}\) under the Gaussian model with those parameters: \(L\left(\phi, \theta, \sigma_{w}^{2}\right)=\frac{1}{(2 \pi)^{n / 2}\left(\Gamma_{n}\right)^{1 / 2}} \exp \left(-\frac{1}{2} X^{\prime} \Gamma_{n}^{-1} X\right)\) where \(\vert A \vert\) denotes the determinant of a matrix \(A\), and \(\Gamma_{n}\) is the variance/covariance matrix of \(X\) with the given parameter values.

The maximum likelihood estimator (MLE) of \(\phi, \theta, \sigma_{w}^{2}\) maximizes this quantity.

The exact Gaussian log-likelihood is then given by:

\[2 \ell\left(\mu, \phi, \theta, \sigma^{2}\right)=n \log 2 \pi+\log \left|\Gamma_{n}\right|+(\boldsymbol{X}-\mu)^{\prime} \Gamma_{n}^{-1}(\boldsymbol{X}-\mu)\]
ARMA - September 13, 2021 - Meenal Jhajharia