In this lab, you will fit data with the auto.arima() function in the forecast package.

Set-up

We will use the anchovy and sardine log metric ton catch before 1988.

data(greeklandings, package="FishForecast")
landings <- subset(greeklandings, Year <= 1987)
anchovy <- ts(subset(landings, Species=="Anchovy")$log.metric.tons, start=1964)
sardine <- ts(subset(landings, Species=="Sardine")$log.metric.tons, start=1964)

Load the necessary packages.

library(forecast)
library(tseries)

Fitting to simulated data

To see info on arima.sim(), type ?arima.sim and ?arima.

Fit to AR(2) data:

\[x_t = 0.8 x_{t-1} + 0.1 x_{t-2} + e_t\]

a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ma1
##       0.9227  -0.2977
## s.e.  0.0401   0.1009
## 
## sigma^2 estimated as 1.042:  log likelihood=-143.63
## AIC=293.26   AICc=293.51   BIC=301.08

Fit to AR(1) data:

\[x_t = 0.8 x_{t-1} + e_t\]

a1 = arima.sim(n=100, model=list(ar=c(.8)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.8820
## s.e.  0.0488
## 
## sigma^2 estimated as 0.87:  log likelihood=-135.18
## AIC=274.36   AICc=274.49   BIC=279.57

\[x_t = 0.8 x_{t-1} + e_t + 0.8 e_{t-1} + 0.2 e_{t-2}\]

Fit to ARMA(1,2) data:

We will up the number of data points to 1000 because models with a MA component take a lot of data to estimate. Models with MA(>1) are not very practical for fisheries data for that reason.

a1 = arima.sim(n=1000, model=list(ar=c(0.8), ma=c(0.8, 0.2)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1 
## ARIMA(2,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ma1
##       1.0949  -0.2441  0.4842
## s.e.  0.0471   0.0460  0.0424
## 
## sigma^2 estimated as 1.002:  log likelihood=-1419.91
## AIC=2847.83   AICc=2847.87   BIC=2867.46

Fitting to 100 simulated data sets

save.fits = rep(NA,100)
for(i in 1:100){
  a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
  fit = auto.arima(a1, seasonal=FALSE, max.d=0, max.q=0)
  save.fits[i] = paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
## save.fits
## 1-0 2-0 3-0 4-0 
##  78  20   1   1

These functions work for data with missing values

Create some AR(2) data and then add missing values (NA).

a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
a1[sample(100,50)]=NA
plot(a1, type="l")
title("many missing values")

As always to understand what the code is doing, run pieces of it and look at the output. Use ?sample (say) to read about functions you don’t know.

Fit to the data with missing values. You need to give auto.arima() a little help by telling it not to include models with differencing. So we set max.d=0. You can remove that and see what happens.

auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.8915
## s.e.  0.0497
## 
## sigma^2 estimated as 1.255:  log likelihood=-86.69
## AIC=177.37   AICc=177.5   BIC=182.52

auto.arima() is nice because it selects the model order for you. But if you know what model you want to fit, you can use Arima():

Arima(a1, order = c(2,0,0))
## Series: a1 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2    mean
##       0.8875  -0.0449  1.2340
## s.e.  0.1721   0.1740  0.6792
## 
## sigma^2 estimated as 1.278:  log likelihood=-85.53
## AIC=179.06   AICc=179.5   BIC=189.36

Fit to the Anchovy landing data

fit <- auto.arima(anchovy)
fit
## Series: anchovy 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.5731  0.0641
## s.e.   0.1610  0.0173
## 
## sigma^2 estimated as 0.03583:  log likelihood=6.5
## AIC=-6.99   AICc=-5.73   BIC=-3.58

Compare to Table 8 in Stergiou and Christou. Note arima() writes a MA model like:

\[x_t = e_t + b_1 e_{t-1} + b_2 e_{t-2}\]

while Stergiou and Christou use the following (also common) notation:

\[x_t = e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}\]

so the MA parameters reported by auto.arima() will be NEGATIVE of that reported in Stergiou and Christou. Note, in Stergiou and Christou, the model is written in backshift notation on page 112. To see the model as the equation above, I translated from backshift to non-backshift notation.

Here are the values for anchovy in Table 8.

ARIMA Model \(\theta_1\) drift (c) R\(^2\) BIC LB
(0,1,1) 0.563 0.064 0.83 1775 5.4

The model selected by auto.arima() is the same. The model selection criteria is similar in spirit but not in detail. Stergiou and Christou used a combination of model fit and BIC (a model selection criteria), which auto.arima() uses AICc by default (you can specifiy BIC).

The parameter estimates are similar.

The BIC value is different, that is expected because there is a constant that is often left off.

The Ljung-Box test is not significant. They don’t give the p-value, but state in the table legend that it is not significant. That is good. That means we don’t reject the null hypothesis of independence.

Here is how to do the test that they did:

res <- resid(fit)
Box.test(res, type="Ljung-Box", lag=12, fitdf=2)
## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 5.3725, df = 10, p-value = 0.8649

Here is how to do the default test with checkresiduals():

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 1.4883, df = 3, p-value = 0.685
## 
## Model df: 2.   Total lags used: 5

Problems

  1. Fit to the sardine data.
  • Use auto.arima()
  • Compare to Table 8 in Stergiou and Christou. Do you get the same result?

I am not sure how Stergiou and Christou selected the amount of differencing to use. auto.arima() uses the KPSS test by default. Let’s try the Augmented Dickey-Fuller test instead. Specify this by passing in test="adf".

auto.arima(sardine, test="adf")
  • Is the selected model the same?
  1. Now fit to the anchovy data 1989-2007
data(greeklandings, package="FishForecast")
anchovy <- subset(greeklandings, Species=="Anchovy" & Year>=1989)$log.metric.tons
anchovy <- ts(anchovy, start=1989)
  • First plot the data.
  • Fit with auto.arima(). Do you get the same model? You don’t. ARIMA(0,1,1) is very different than ARIMA(0,0,1).
  • Force a fit of ARIMA(0,1,1) with drift using Arima(). This is the best fit model with auto.arima().
Arima(anchovy, order=c(0,1,1), include.drift=TRUE)
## Series: anchovy 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1    drift
##       -1.0000  -0.0069
## s.e.   0.1959   0.0070
## 
## sigma^2 estimated as 0.03183:  log likelihood=5.08
## AIC=-4.15   AICc=-2.44   BIC=-1.48
  • Compare the estimates of ma1.