# Load the data Load data and create a subset dataset `landings` with the years 1964 to 1987. ```{r load_data} data(greeklandings, package="FishForecast") landings = subset(greeklandings, Year <= 1987) ``` Add `t` to the data. This is Year minus first Year. ```{r} landings$t = landings$Year-landings$Year[1] ``` We will look at sardines and a 4th order polynomial. h is the forecast length. h=5 is 5 years. ```{r} val <- "Sardine" poly.order <- 4 h <- 5 ``` # Fit linear regression Use the poly function $$log(Sardine catch) = \alpha + \beta t + \beta_2 t^2 + \beta_3 t^3 + \beta_4 t^4 + e_t$$ ```{r tvreg.sardine3} model <- lm(log.metric.tons ~ poly(t, poly.order), data=landings, subset=Species==val) ``` # Create a forecast Use the `predict()` function. First you must set up the newdata data.frame. This is specific to the `predict()` function. Let's predict for the next 5 t after the end of the data. ```{r} dat <- subset(landings, Species==val) dat$t ``` The newdata data.frame must have the name the same as your predictor `t` and have the values at the `t` you want to forecast. ```{r} last.t <- max(dat$t) nd <- data.frame(t = last.t + 1:h) ``` Now you can predict: ```{r} predict(model, newdata = nd) ``` That is for years ```{r} max(dat$Year) + 1:h ``` # Show the prediction intervals on your forecast You can use predict for this too. `lwr` is the lower 95% prediction and upr is the upper 95% prediction. ```{r} predict(model, newdata = nd, interval="prediction") ``` # Make a nicer table of predictions Show in metric tons instead of log metric tons. ```{r} pr <- predict(model, newdata = nd, interval="prediction") pr <- as.data.frame(exp(pr)) pr$Year <- max(dat$Year) + 1:h ``` Make table in R markdown ```{r} library(knitr) kable(pr) ``` # Create plots of forecasts ```{r} pr <- predict(model, newdata = nd, interval="prediction") pr <- as.data.frame(exp(pr)) ylims <- c(min(dat$metric.tons, pr$fit), max(dat$metric.tons, pr$fit)) pr$Year <- max(dat$Year) + 1:h plot(dat$Year, dat$metric.tons, xlim=c(min(dat$Year), max(dat$Year)+h), ylim=ylims) lines(dat$Year, exp(fitted(model)), col="red") lines(pr$Year, pr$fit) ``` Here is a function I created to make forecast plots using ggplot(). ```{r plot.TVreg.func, echo=FALSE} plotforecasttv <- function(object, h=5){ require(ggplot2) dat <- object$model dat <- as.matrix(dat) dat <- data.frame(resp=dat[,1], t=0:(dim(dat)[1]-1), fit=fitted(model)) tlim <- max(dat$t)+1:h pr95 <- predict(model, newdata = data.frame(t=max(dat$t)+1:h), level=0.95, interval="prediction") pr80 <- predict(model, newdata = data.frame(t=max(dat$t)+1:h), level=0.80, interval="prediction") pr95 <- as.data.frame(pr95); pr95$t <- tlim pr80 <- as.data.frame(pr80); pr80$t <- tlim ylims <- c(min(dat[,1],pr95$lwr, pr80$lwr), max(dat[,1],pr95$upr, pr80$upr)) p1 <- ggplot(dat, aes(x = t, y = resp))+ theme_bw() + geom_point(color = "blue") + xlim(0,max(tlim)) + ylim(ylims) + geom_line(color = "red", aes(x=t, y=fit)) p1 + geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr95, inherit.aes=FALSE, fill = "grey50") + geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr80, inherit.aes=FALSE, fill = "grey75") + geom_line(aes(x=t, y=fit), pr95) } ``` You can plot with ```{r} plotforecasttv(model) ``` # The problem with high order polynomials You model fits the data nicely, but your forecasts are very uncertain as you move farther out. This is a property of very flexible models. They fit the data, but tend to overfit so forecasts are more not less uncertain. # Problems 1. Fit a 1st order polynomial to the sardine data. Create a forecast for 10 years into the future. 1. Create forecasts for one of the other species. ```{r} unique(landings$Species) ```