Load the data

load("landings.RData")
landings$log.metric.tons = log(landings$metric.tons)
landings = subset(landings, Year <= 1987)

Add t to the data. This is Year minus first Year.

landings$t = landings$Year-landings$Year[1]

We will look at sardines and a 4th order polynomial. h is the forecast length. h=5 is 5 years.

val <- "Sardine"
poly.order <- 4
h <- 5

Fit linear regression

Use the poly function

\[log(Sardine catch) = \alpha + \beta t + \beta_2 t^2 + \beta_3 t^3 + \beta_4 t^4 + e_t\]

model <- lm(log.metric.tons ~ poly(t, poly.order), data=landings, subset=Species==val)

Create a forecast

Use the predict() function. First you must set up the newdata data.frame. This is specific to the predict() function. Let’s predict for the next 5 t after the end of the data.

dat <- subset(landings, Species==val)
dat$t
##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

The newdata data.frame must have the name the same as your predictor t and have the values at the t you want to forecast.

last.t <- max(dat$t)
nd <- data.frame(t = last.t + 1:h)

Now you can predict:

predict(model, newdata = nd)
##        1        2        3        4        5 
## 9.241230 9.293302 9.387034 9.532185 9.739329

That is for years

max(dat$Year) + 1:h
## [1] 1988 1989 1990 1991 1992

Show the prediction intervals on your forecast

You can use predict for this too. lwr is the lower 95% prediction and upr is the upper 95% prediction.

predict(model, newdata = nd, interval="prediction")
##        fit      lwr       upr
## 1 9.241230 8.946917  9.535543
## 2 9.293302 8.881264  9.705340
## 3 9.387034 8.802235  9.971834
## 4 9.532185 8.714969 10.349401
## 5 9.739329 8.623837 10.854821

Make a nicer table of predictions

Show in metric tons instead of log metric tons.

pr <- predict(model, newdata = nd, interval="prediction")
pr <- as.data.frame(exp(pr))
pr$Year <- max(dat$Year) + 1:h

Make table in R markdown

library(knitr)
kable(pr)
fit lwr upr Year
10313.72 7684.167 13843.11 1988
10865.00 7195.879 16404.98 1989
11932.66 6649.086 21414.72 1990
13796.70 6093.444 31238.33 1991
16972.15 5562.689 51783.21 1992

Create plots of forecasts

pr <- predict(model, newdata = nd, interval="prediction")
pr <- as.data.frame(exp(pr))
ylims <- c(min(dat$metric.tons, pr$fit), max(dat$metric.tons, pr$fit))
pr$Year <- max(dat$Year) + 1:h
plot(dat$Year, dat$metric.tons, xlim=c(min(dat$Year), max(dat$Year)+h),
     ylim=ylims)
lines(dat$Year, exp(fitted(model)), col="red")
lines(pr$Year, pr$fit)

Here is a function I created to make forecast plots using ggplot().

You can plot with

plotforecasttv(model)

The problem with high order polynomials

You model fits the data nicely, but your forecasts are very uncertain as you move farther out. This is a property of very flexible models. They fit the data, but tend to overfit so forecasts are more not less uncertain.

Problems

  1. Fit a 1st order polynomial to the sardine data. Create a forecast for 10 years into the future.

  2. Create forecasts for one of the other species.

unique(landings$Species)
## [1] Anchovy        Sardine        Chub.mackerel  Horse.mackerel Mackerel      
## [6] Jack.Mackerel 
## 6 Levels: Anchovy Sardine Chub.mackerel Horse.mackerel ... Jack.Mackerel