layout: true .hheader[<a href="index.html"><svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 576 512"><path d="M488 312.7V456c0 13.3-10.7 24-24 24H348c-6.6 0-12-5.4-12-12V356c0-6.6-5.4-12-12-12h-72c-6.6 0-12 5.4-12 12v112c0 6.6-5.4 12-12 12H112c-13.3 0-24-10.7-24-24V312.7c0-3.6 1.6-7 4.4-9.3l188-154.8c4.4-3.6 10.8-3.6 15.3 0l188 154.8c2.7 2.3 4.3 5.7 4.3 9.3zm83.6-60.9L488 182.9V44.4c0-6.6-5.4-12-12-12h-56c-6.6 0-12 5.4-12 12V117l-89.5-73.7c-17.7-14.6-43.3-14.6-61 0L4.4 251.8c-5.1 4.2-5.8 11.8-1.6 16.9l25.5 31c4.2 5.1 11.8 5.8 16.9 1.6l235.2-193.7c4.4-3.6 10.8-3.6 15.3 0l235.2 193.7c5.1 4.2 12.7 3.5 16.9-1.6l25.5-31c4.2-5.2 3.4-12.7-1.7-16.9z"/></svg></a>] --- class: center, middle, inverse # Seasonal ARIMA Models .futnote[Eli Holmes, UW SAFS] .citation[eeholmes@uw.edu] --- ## Seasonality Load the chinook salmon data set ```r load("chinook.RData") head(chinook) ```
Year
Month
Species
log.metric.tons
metric.tons
1990
Jan
Chinook
3.4
29.9
1990
Feb
Chinook
3.81
45.1
1990
Mar
Chinook
3.51
33.5
1990
Apr
Chinook
4.25
70
1990
May
Chinook
5.2
181
1990
Jun
Chinook
4.37
79.2
--- The data are monthly and start in January 1990. To make this into a ts object do ```r chinookts <- ts(chinook$log.metric.tons, start=c(1990,1), frequency=12) ``` `start` is the year and month and frequency is the number of months in the year. Use `?ts` to see more examples of how to set up ts objects. --- ## Plot seasonal data ```r plot(chinookts) ``` <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Seasonal ARIMA model Seasonally differenced data: `$$z_t = x_t - x_{t+s} - m$$` Basic structure of a seasonal AR model `\(z_t\)` = AR(p) + AR(season) + AR(p+season) Example AR(1) non-seasonal part + AR(1) seasonal part `$$z_t = \phi_1 z_{t-1} + \Phi_1 z_{t-12} - \phi_1\Phi_1 z_{t-13}$$` --- ## Notation ARIMA (p,d,q)(ps,ds,qs)S ARIMA (1,0,0)(1,1,0)[12] Notice we are modeling `\(x\)` this year in Jan (say) as a function of `\(x\)` in Jan last year. --- ## Seasonal models Let's imagine that we can describe our data as a combination of the mean trend, a seasonal term, and error. `$$x_t = \mu t+ s_t + w_t$$` Let's imagine that the seasonal term is just a constant based on month and doesn't change with time. `$$s_t = f(month)$$` --- We want to remove the `\(s_t\)` with differencing so that we can model `\(e_t\)`. We can solve for `\(x_{t+1}\)` by using `\(x_{t-s}\)` where `\(s\)` is the seasonal length (e.g. 12 if season is yearly). When we take the first seasonal difference, we get `$$\Delta_s x_t = \mu(t-(t-s)) + s_t - s_{t-s} + w_t - w_{t-s} = \mu s + w_t - w_{t-s}$$` The `\(s_t-s_{t-s}\)` disappears because `\(s_t = s_{t-s}\)` when the seasonal effect is just a function of the month. Depending on what `\(m_t\)` is, we might be done or we might have to do a first difference. Notice that the error term is a moving average in the seasonal part. --- <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ```r plot(diff(xt,lag=12)) ``` <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- We can recover the model parameters with `Arima()`. Note the drift term is returned at `\(\mu\)` not `\(\mu s\)`. ```r forecast::Arima(xt, seasonal=c(0,1,1), include.drift=TRUE) ``` ``` ## Series: xt ## ARIMA(0,0,0)(0,1,1)[12] with drift ## ## Coefficients: ## sma1 drift ## -1.0000 0.0496 ## s.e. 0.1845 0.0008 ## ## sigma^2 estimated as 0.08162: log likelihood=-30.74 ## AIC=67.49 AICc=67.72 BIC=75.53 ``` --- `auto.arima()` identifies a ARIMA(0,0,0)(0,1,2)[12]. ```r forecast::auto.arima(xt, stepwise=FALSE) ``` ``` ## Series: xt ## ARIMA(0,0,0)(0,1,2)[12] with drift ## ## Coefficients: ## sma1 sma2 drift ## -1.0488 0.1765 0.0496 ## s.e. 0.1606 0.1148 0.0007 ## ## sigma^2 estimated as 0.08705: log likelihood=-29.56 ## AIC=67.12 AICc=67.51 BIC=77.85 ``` --- ## Seasonal model with changing season Let's imagine that our seasonality is increasing over time. `$$s_t = \beta \times year \times f(month)\times$$` When we take the first seasonal difference, we get `$$\Delta_s x_t = \mu(t-(t-s)) + \beta f(month)\times (year - (year-1)) + w_t - w_{t-s} \\ = \mu s + \beta f(month) + w_t - w_{t-s}$$` --- We need to take another seasonal difference to get rid of the `\(f(month)\)` which is not a constant; it is different for different months as it is our seasonality. `$$\Delta^2_{s} x_t = w_t - w_{t-s} - w_{t-s} + w_{t-2s}=w_t - 2w_{t-s} + w_{t-2s}$$` So our ARIMA model should be ARIMA(0,0,0)(0,2,2) --- <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- But we can recover the model with `Arima()`. Note the drift term is returned at `\(\mu\)` not `\(\mu s\)`. ```r forecast::Arima(xt, seasonal=c(0,2,2)) ``` ``` ## Series: xt ## ARIMA(0,0,0)(0,2,2)[12] ## ## Coefficients: ## sma1 sma2 ## -1.7745 0.9994 ## s.e. 0.4178 0.4624 ## ## sigma^2 estimated as 0.1045: log likelihood=-55.1 ## AIC=116.2 AICc=116.46 BIC=123.9 ``` --- `auto.arima()` again has problems and returns many Infs; turn on `trace=TRUE` to see the problem. ```r forecast::auto.arima(xt, stepwise=FALSE) ``` ``` ## Series: xt ## ARIMA(4,0,0)(1,1,0)[12] with drift ## ## Coefficients: ## ar1 ar2 ar3 ar4 sar1 drift ## 0.2548 -0.0245 0.2166 -0.2512 -0.5393 0.0492 ## s.e. 0.0945 0.0947 0.0943 0.0966 0.0868 0.0025 ## ## sigma^2 estimated as 0.1452: log likelihood=-48.2 ## AIC=110.41 AICc=111.53 BIC=129.18 ``` --- ## Seasonal model with changing season #2 Let's imagine that our seasonality increases and then decreases. `$$s_t = (a y^2-b y+h) f(month)$$` <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- Then we need to take 3 seasonal differences to get rid of the seasonality. The first will get rid of the `\(h f(month)\)`, the next will get rid of `\(by\)` (year) terms and `\(y^2\)` terms, the third will get rid of extra `\(y\)` terms introduced by the 2nd difference. The seasonal differences will get rid of the linear trend also. `$$\Delta^3_{s} x_t = w_t - 2w_{t-s} + w_{t-2s}-w_{t-s}+2w_{t-2s}-w_{t-3s}=w_t - 3w_{t-s} + 3w_{t-2s}-w_{t-3s}$$` So our ARIMA model should be ARIMA(0,0,0)(0,3,3). --- ## `auto.arima()` for seasonal ts `auto.arima()` will recognize that our data has season and fit a seasonal ARIMA model to our data by default. We will define the training data up to 1998 and use 1999 as the test data. ```r traindat <- window(chinookts, c(1990,10), c(1998,12)) testdat <- window(chinookts, c(1999,1), c(1999,12)) fit <- forecast::auto.arima(traindat) fit ``` ``` ## Series: traindat ## ARIMA(1,0,0)(0,1,0)[12] with drift ## ## Coefficients: ## ar1 drift ## 0.3676 -0.0320 ## s.e. 0.1335 0.0127 ## ## sigma^2 estimated as 0.758: log likelihood=-107.37 ## AIC=220.73 AICc=221.02 BIC=228.13 ``` --- ## Forecast using seasonal model ```r fr <- forecast::forecast(fit, h=12) plot(fr) points(testdat) ``` <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Missing values Missing values are ok when fitting a seasonal ARIMA model <img src="Forecasting_3-7_-_ARMA_Seasonal_Models_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Summary Basic steps for identifying a seasonal model. **forecast** automates most of this. * Check that you have specified your season correctly in your ts object. * Plot your data. Look for trend, seasonality and random walks. --- ## Summary * Use differencing to remove season and trend. * Season and no trend. Take a difference of lag = season * No seasonality but a trend. Try a first difference * Both. Do both types of differences * Neither. No differencing. * Random walk. First difference. * Parametric looking curve. Tranform. --- ## Summary * Examine the ACF and PACF of the differenced data. * Look for patterns (spikes) at seasonal lags * Estimate likely models and compare with model selection criteria (or cross-validation). Use `TRACE=TRUE` * Do residual checks.