5.1 Metrics

How to we quantify the difference between the forecast and the actual values in the test data set?

Let’s take the example of a training set/test set.

The forecast errors are the difference between the test data and the forecasts.

fr.err <- testdat - fr$mean
fr.err
## Time Series:
## Start = 1988 
## End = 1989 
## Frequency = 1 
## [1] -0.1704302 -0.4944778

5.1.1 accuracy() function

The accuracy() function in forecast provides many different metrics such as mean error, root mean square error, mean absolute error, mean percentage error, mean absolute percentage error. It requires a forecast object and a test data set that is the same length.

accuracy(fr, testdat)
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.00473511 0.1770653 0.1438523 -0.1102259 1.588409 0.7698386
## Test set     -0.33245398 0.3698342 0.3324540 -3.4390277 3.439028 1.7791577
##                     ACF1 Theil's U
## Training set -0.04312022        NA
## Test set     -0.50000000   1.90214

The metrics are:

ME Mean err

me <- mean(fr.err)
me
## [1] -0.332454

RMSE Root mean squared error

rmse <- sqrt(mean(fr.err^2))
rmse
## [1] 0.3698342

MAE Mean absolute error

mae <- mean(abs(fr.err))
mae
## [1] 0.332454

MPE Mean percentage error

fr.pe <- 100*fr.err/testdat
mpe <- mean(fr.pe)
mpe
## [1] -3.439028

MAPE Mean absolute percentage error

mape <- mean(abs(fr.pe))
mape
## [1] 3.439028
accuracy(fr, testdat)[,1:5]
##                       ME      RMSE       MAE        MPE     MAPE
## Training set -0.00473511 0.1770653 0.1438523 -0.1102259 1.588409
## Test set     -0.33245398 0.3698342 0.3324540 -3.4390277 3.439028
c(me, rmse, mae, mpe, mape)
## [1] -0.3324540  0.3698342  0.3324540 -3.4390277  3.4390277

5.1.2 Test multiple models

Now that you have some metrics for forecast accuracy, you can compute these for all the models in your candidate set.

# The model picked by auto.arima
fit1 <- forecast::Arima(traindat, order=c(0,1,1))
fr1 <- forecast::forecast(fit1, h=2)
test1 <- forecast::accuracy(fr1, testdat)[2,1:5]

# AR-1
fit2 <- forecast::Arima(traindat, order=c(1,1,0))
fr2 <- forecast::forecast(fit2, h=2)
test2 <- forecast::accuracy(fr2, testdat)[2,1:5]

# Naive model with drift
fit3 <- forecast::rwf(traindat, drift=TRUE)
fr3 <- forecast::forecast(fit3, h=2)
test3 <- forecast::accuracy(fr3, testdat)[2,1:5]

Show a summary

ME RMSE MAE MPE MAPE
(0,1,1) -0.293 0.320 0.293 -3.024 3.024
(1,1,0) -0.309 0.341 0.309 -3.200 3.200
Naive -0.483 0.510 0.483 -4.985 4.985

5.1.3 Cross-validation

Computing forecast errors and performance metrics with time series cross-validation is similar to the training set/test test approach.

The first step to using the tsCV() function is to define the function that returns a forecast for your model. Your function needs to take x, a time series, and h the length of the forecast. You can also have other arguments if needed. Here is an example function for a forecast from an ARIMA model.

fun <- function(x, h, order){
  forecast::forecast(Arima(x, order=order), h=h)
}

We pass this into the tsCV() function. tsCV() requires our dataset and our forecast function. The arguments after the forecast function are those we included in our fun definition. tsCV() returns a time series of errors.

e <- forecast::tsCV(traindat, fun, h=1, order=c(0,1,1))

We then can compute performance metrics from these errors.

tscv1 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
tscv1
##        ME      RMSE       MAE 
## 0.1128788 0.2261706 0.1880392

Cross-validation farther in future

Compare accuracy of forecasts 1 year out versus 4 years out. If h is greater than 1, then the errors are returned as a matrix with each h in a column. Column 4 is the forecast, 4 years out.

e <- forecast::tsCV(traindat, fun, h=4, order=c(0,1,1))[,4]
#RMSE
tscv4 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
rbind(tscv1, tscv4)
##              ME      RMSE       MAE
## tscv1 0.1128788 0.2261706 0.1880392
## tscv4 0.2839064 0.3812815 0.3359689

As we would expect, forecast errors are higher when we make forecasts farther into the future.

Cross-validation with a fixed window

Compare accuracy of forecasts with a fixed 10-year window and 1-year out forecasts.

e <- forecast::tsCV(traindat, fun, h=1, order=c(0,1,1), window=10)
#RMSE
tscvf1 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
tscvf1
##        ME      RMSE       MAE 
## 0.1387670 0.2286572 0.1942840

All the forecasts tests together

Here are all 4 types of forecasts tests together. There is not right approach. Time series cross-validation has the advantage that you test many more forecasts and use all your data.

comp.tab <- rbind(train.test=test1[c("ME","RMSE","MAE")],
      tsCV.variable1=tscv1,
      tsCV.variable4=tscv4,
      tsCV.fixed1=tscvf1)
knitr::kable(comp.tab, format="html")
ME RMSE MAE
train.test -0.2925326 0.3201093 0.2925326
tsCV.variable1 0.1128788 0.2261706 0.1880392
tsCV.variable4 0.2839064 0.3812815 0.3359689
tsCV.fixed1 0.1387670 0.2286572 0.1942840