5.3 Metrics

How to we quantify the difference between the forecast and the actual values in the test data set?

Let’s take the example of a training set/test set.

The forecast errors are the difference between the test data and the forecasts.

fr.err <- testdat - fr$mean
fr.err

## Time Series:
## Start = 1988 
## End = 1989 
## Frequency = 1 
## [1] -0.1704302 -0.4944778

5.3.1 `accuracy()` function

The accuracy() function in forecast provides many different metrics such as mean error, root mean square error, mean absolute error, mean percentage error, mean absolute percentage error. It requires a forecast object and a test data set that is the same length.

accuracy(fr, testdat)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.00473511 0.1770653 0.1438523 -0.1102259 1.588409 0.7698386
## Test set     -0.33245398 0.3698342 0.3324540 -3.4390277 3.439028 1.7791577
##                     ACF1 Theil's U
## Training set -0.04312022        NA
## Test set     -0.50000000   1.90214

The metrics are:

ME Mean err

me <- mean(fr.err)
me

## [1] -0.332454

RMSE Root mean squared error

rmse <- sqrt(mean(fr.err^2))
rmse

## [1] 0.3698342

MAE Mean absolute error

mae <- mean(abs(fr.err))
mae

## [1] 0.332454

MPE Mean percentage error

fr.pe <- 100*fr.err/testdat
mpe <- mean(fr.pe)
mpe

## [1] -3.439028

MAPE Mean absolute percentage error

mape <- mean(abs(fr.pe))
mape

## [1] 3.439028

accuracy(fr, testdat)[,1:5]

##                       ME      RMSE       MAE        MPE     MAPE
## Training set -0.00473511 0.1770653 0.1438523 -0.1102259 1.588409
## Test set     -0.33245398 0.3698342 0.3324540 -3.4390277 3.439028

c(me, rmse, mae, mpe, mape)

## [1] -0.3324540  0.3698342  0.3324540 -3.4390277  3.4390277

5.3.2 Test multiple models

Now that you have some metrics for forecast accuracy, you can compute these for all the models in your candidate set.

# The model picked by auto.arima
fit1 <- forecast::Arima(traindat, order=c(0,1,1))
fr1 <- forecast::forecast(fit1, h=2)
test1 <- forecast::accuracy(fr1, testdat)[2,1:5]

# AR-1
fit2 <- forecast::Arima(traindat, order=c(1,1,0))
fr2 <- forecast::forecast(fit2, h=2)
test2 <- forecast::accuracy(fr2, testdat)[2,1:5]

# Naive model with drift
fit3 <- forecast::rwf(traindat, drift=TRUE)
fr3 <- forecast::forecast(fit3, h=2)
test3 <- forecast::accuracy(fr3, testdat)[2,1:5]

Show a summary

	ME	RMSE	MAE	MPE	MAPE
(0,1,1)	-0.293	0.320	0.293	-3.024	3.024
(1,1,0)	-0.309	0.341	0.309	-3.200	3.200
Naive	-0.483	0.510	0.483	-4.985	4.985

5.3.3 Cross-validation

Computing forecast errors and performance metrics with time series cross-validation is similar to the training set/test test approach.

The first step to using the tsCV() function is to define the function that returns a forecast for your model. Your function needs to take x, a time series, and h the length of the forecast. You can also have other arguments if needed. Here is an example function for a forecast from an ARIMA model.

fun <- function(x, h, order){
  forecast::forecast(Arima(x, order=order), h=h)
}

We pass this into the tsCV() function. tsCV() requires our dataset and our forecast function. The arguments after the forecast function are those we included in our fun definition. tsCV() returns a time series of errors.

e <- forecast::tsCV(traindat, fun, h=1, order=c(0,1,1))

We then can compute performance metrics from these errors.

tscv1 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
tscv1

##        ME      RMSE       MAE 
## 0.1128788 0.2261706 0.1880392

Cross-validation farther in future

Compare accuracy of forecasts 1 year out versus 4 years out. If h is greater than 1, then the errors are returned as a matrix with each h in a column. Column 4 is the forecast, 4 years out.

e <- forecast::tsCV(traindat, fun, h=4, order=c(0,1,1))[,4]
#RMSE
tscv4 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
rbind(tscv1, tscv4)

##              ME      RMSE       MAE
## tscv1 0.1128788 0.2261706 0.1880392
## tscv4 0.2839064 0.3812815 0.3359689

As we would expect, forecast errors are higher when we make forecasts farther into the future.

Cross-validation with a fixed window

Compare accuracy of forecasts with a fixed 10-year window and 1-year out forecasts.

e <- forecast::tsCV(traindat, fun, h=1, order=c(0,1,1), window=10)
#RMSE
tscvf1 <- c(ME=mean(e, na.rm=TRUE), RMSE=sqrt(mean(e^2, na.rm=TRUE)), MAE=mean(abs(e), na.rm=TRUE))
tscvf1

##        ME      RMSE       MAE 
## 0.1387670 0.2286572 0.1942840

All the forecasts tests together

Here are all 4 types of forecasts tests together. There is not right approach. Time series cross-validation has the advantage that you test many more forecasts and use all your data.

comp.tab <- rbind(train.test=test1[c("ME","RMSE","MAE")],
      tsCV.variable1=tscv1,
      tsCV.variable4=tscv4,
      tsCV.fixed1=tscvf1)
knitr::kable(comp.tab, format="html")

	ME	RMSE	MAE
train.test	-0.2925326	0.3201093	0.2925326
tsCV.variable1	0.1128788	0.2261706	0.1880392
tsCV.variable4	0.2839064	0.3812815	0.3359689
tsCV.fixed1	0.1387670	0.2286572	0.1942840