In this lab, you will fit data with the auto.arima()
function in the forecast package.
We will use the anchovy and sardine log metric ton catch before 1988.
data(greeklandings, package="FishForecast")
landings <- subset(greeklandings, Year <= 1987)
anchovy <- ts(subset(landings, Species=="Anchovy")$log.metric.tons, start=1964)
sardine <- ts(subset(landings, Species=="Sardine")$log.metric.tons, start=1964)
Load the necessary packages.
library(forecast)
library(tseries)
To see info on arima.sim()
, type ?arima.sim
and ?arima
.
Fit to AR(2) data:
\[x_t = 0.8 x_{t-1} + 0.1 x_{t-2} + e_t\]
a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.9227 -0.2977
## s.e. 0.0401 0.1009
##
## sigma^2 estimated as 1.042: log likelihood=-143.63
## AIC=293.26 AICc=293.51 BIC=301.08
Fit to AR(1) data:
\[x_t = 0.8 x_{t-1} + e_t\]
a1 = arima.sim(n=100, model=list(ar=c(.8)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.8820
## s.e. 0.0488
##
## sigma^2 estimated as 0.87: log likelihood=-135.18
## AIC=274.36 AICc=274.49 BIC=279.57
\[x_t = 0.8 x_{t-1} + e_t + 0.8 e_{t-1} + 0.2 e_{t-2}\]
Fit to ARMA(1,2) data:
We will up the number of data points to 1000 because models with a MA component take a lot of data to estimate. Models with MA(>1) are not very practical for fisheries data for that reason.
a1 = arima.sim(n=1000, model=list(ar=c(0.8), ma=c(0.8, 0.2)))
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1
## ARIMA(2,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ma1
## 1.0949 -0.2441 0.4842
## s.e. 0.0471 0.0460 0.0424
##
## sigma^2 estimated as 1.002: log likelihood=-1419.91
## AIC=2847.83 AICc=2847.87 BIC=2867.46
save.fits = rep(NA,100)
for(i in 1:100){
a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
fit = auto.arima(a1, seasonal=FALSE, max.d=0, max.q=0)
save.fits[i] = paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
## save.fits
## 1-0 2-0 3-0 4-0
## 78 20 1 1
Create some AR(2) data and then add missing values (NA).
a1 = arima.sim(n=100, model=list(ar=c(.8,.1)))
a1[sample(100,50)]=NA
plot(a1, type="l")
title("many missing values")
As always to understand what the code is doing, run pieces of it and look at the output. Use ?sample
(say) to read about functions you don’t know.
Fit to the data with missing values. You need to give auto.arima()
a little help by telling it not to include models with differencing. So we set max.d=0
. You can remove that and see what happens.
auto.arima(a1, seasonal=FALSE, max.d=0)
## Series: a1
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.8915
## s.e. 0.0497
##
## sigma^2 estimated as 1.255: log likelihood=-86.69
## AIC=177.37 AICc=177.5 BIC=182.52
auto.arima()
is nice because it selects the model order for you. But if you know what model you want to fit, you can use Arima()
:
Arima(a1, order = c(2,0,0))
## Series: a1
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.8875 -0.0449 1.2340
## s.e. 0.1721 0.1740 0.6792
##
## sigma^2 estimated as 1.278: log likelihood=-85.53
## AIC=179.06 AICc=179.5 BIC=189.36
fit <- auto.arima(anchovy)
fit
## Series: anchovy
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -0.5731 0.0641
## s.e. 0.1610 0.0173
##
## sigma^2 estimated as 0.03583: log likelihood=6.5
## AIC=-6.99 AICc=-5.73 BIC=-3.58
Compare to Table 8 in Stergiou and Christou. Note arima()
writes a MA model like:
\[x_t = e_t + b_1 e_{t-1} + b_2 e_{t-2}\]
while Stergiou and Christou use the following (also common) notation:
\[x_t = e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}\]
so the MA parameters reported by auto.arima()
will be NEGATIVE of that reported in Stergiou and Christou. Note, in Stergiou and Christou, the model is written in backshift notation on page 112. To see the model as the equation above, I translated from backshift to non-backshift notation.
Here are the values for anchovy in Table 8.
ARIMA Model | \(\theta_1\) | drift (c) | R\(^2\) | BIC | LB |
---|---|---|---|---|---|
(0,1,1) | 0.563 | 0.064 | 0.83 | 1775 | 5.4 |
The model selected by auto.arima()
is the same. The model selection criteria is similar in spirit but not in detail. Stergiou and Christou used a combination of model fit and BIC (a model selection criteria), which auto.arima()
uses AICc by default (you can specifiy BIC).
The parameter estimates are similar.
The BIC value is different, that is expected because there is a constant that is often left off.
The Ljung-Box test is not significant. They don’t give the p-value, but state in the table legend that it is not significant. That is good. That means we don’t reject the null hypothesis of independence.
Here is how to do the test that they did:
res <- resid(fit)
Box.test(res, type="Ljung-Box", lag=12, fitdf=2)
##
## Box-Ljung test
##
## data: res
## X-squared = 5.3725, df = 10, p-value = 0.8649
Here is how to do the default test with checkresiduals()
:
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 1.4883, df = 3, p-value = 0.685
##
## Model df: 2. Total lags used: 5
auto.arima()
I am not sure how Stergiou and Christou selected the amount of differencing to use. auto.arima()
uses the KPSS test by default. Let’s try the Augmented Dickey-Fuller test instead. Specify this by passing in test="adf"
.
auto.arima(sardine, test="adf")
data(greeklandings, package="FishForecast")
anchovy <- subset(greeklandings, Species=="Anchovy" & Year>=1989)$log.metric.tons
anchovy <- ts(anchovy, start=1989)
auto.arima()
. Do you get the same model? You don’t. ARIMA(0,1,1) is very different than ARIMA(0,0,1).Arima()
. This is the best fit model with auto.arima()
.Arima(anchovy, order=c(0,1,1), include.drift=TRUE)
## Series: anchovy
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -1.0000 -0.0069
## s.e. 0.1959 0.0070
##
## sigma^2 estimated as 0.03183: log likelihood=5.08
## AIC=-4.15 AICc=-2.44 BIC=-1.48
ma1
.