6.1 Covariates used in Stergiou and Christou
Stergiou and Christou used five environmental covariates: air temperature (air), sea-level pressure (slp), sea surface temperature (sst), vertical wind speed (vwnd), and wind speed cubed (wspd3). I downloaded monthly values for these covariates from the three 1 degree boxes used by Stergiou and Christou from the ICOADS database. I then computed a yearly average over all months in the three boxes. These yearly average environmental covariates are in covsmean.year
, which is part of landings
in the FishForecast package.
## [1] "Year" "air.degC" "slp.millibars" "sst.degC"
## [5] "vwnd.m/s" "wspd3.m3/s3"
The covariates are those in Stergiou and Christou with the following differences. I used the ICOADS data not the COADS data. The boxes are 1 degree but on 1 degree centers not 0.5 centers. Thus the box is 39.5-40.5 not 39-40. ICOADS does not include ‘vertical wind’. I used NS winds which may be different. The code to download the ICOADS data is in the appendix.
In addition to the environmental covariates, Stergiou and Christou used many covariates of fishing effort for trawlers, purse seiners, beach seiners, other coastal boats and demersal (sum of trawlers, beach seiners and other coastal boats). For each fishery type, they used data on number of fishers (FI), number of boats (BO), total engine horse power (HP), total boat tonnage (TO). They also used an economic variable: value (VA) of catch for trawlers, purse seiners, beach seiners, other coastal boats. These fishery covariates were extracted from the Greek Statistical Reports (1.1.1).
## [1] "Year" "Boats.BO" "Trawlers.BOT"
## [4] "Purse.seiners.BOP" "Beach.seiners.BOB" "Other.BOC"
## [7] "Demersal.BOD" "Fishers.FI" "Trawlers.FIT"
## [10] "Purse.seiners.FIP" "Beach.seiners.FIB" "Other.FIC"
## [13] "Demersal.FID" "Horsepower.HP" "Trawler.HPT"
## [16] "Purse.seiners.HPP" "Beach.seiners.HPB" "Other.HPC"
## [19] "Demersal.HPD" "Trawler.VAT" "Purse.seiners.VAP"
## [22] "Beach.seiners.VAB" "Other.VAC" "Tonnage.TO"
## [25] "Trawlers.TOT" "Purse.seiners.TOP"
For anchovy, the fishery effort metrics from the purse seine fishery were used. Lastly, biological covariates were included which were the landings of other species. Stergiou and Christou state (page 118) that the other species modeled by VAR (page 114) was included. This would imply that sardine was used as an explanatory variable. However in Table 3 (page 119), it appears that Trachurus (Horse mackerel) was included. It is not clear if sardine was also included but not chosen as an important variable. I included Trachurus and not sardine as the biological explanatory variable.
Preparing the data frame
We will model anchovy landings as the response variable. The covariates are lagged by one year, following Stergiou and Christou. This means that the catch in year \(t\) is regressed against the covariates in year \(t-1\). We set up our data frame as follows. We use the 1965 to 1987 catch data as the response. We use 1964 to 1986, so year prior, for all the explanatory variables and we log transform the explanatory variables (following Stergiou and Christou). We use \(t\) 1 to 23 as a “year” covariate. Our data frame will have the following columns:
## [1] "anchovy" "Year" "Trachurus" "air" "slp" "sst"
## [7] "vwnd" "wspd3" "BOP" "FIP" "HPP" "TOP"
In total, there are 11 covariates and 23 years of data—which is not much data per explanatory variable. Section @ref(cov.df) shows the R code to create the df
data frame with the response variable and all the explanatory variables.
For most of the analyses, we will use the untransformed variables, however for some analyses, we will want the effect sizes (the estimated \(\beta\)’s) to be on the same scale. For these analyses, we will use the z-scored variables, which will be stored in data frame dfz
. z-scoring removes the mean and normalizes the variance to 1. Here is a loop to demean and rescale our data frame.
dfz <- df
n <- nrow(df)
for(i in colnames(df)){
pop_sd <- sd(df[,i])*sqrt((n-1)/n)
pop_mean <- mean(df[,i])
dfz[,i] <- (df[,i]-pop_mean)/pop_sd
}
The function scale()
will also do a scaling to the unbiased variance instead of the sample variance (divide by \(n-1\) instead of \(n\)) and will return a matrix. We will use dfz
which is scaled to the sample variance as we will need this for the chapter on Principal Components Regression.
6.1.1 Creating the data frame for model fitting
Code to make the df
data frame used in the model fitting functions.
# response
df <- data.frame(anchovy=anchovy$log.metric.tons,
Year=anchovy$Year)
Year1 <- df$Year[1]
Year2 <- df$Year[length(df$Year)]
df <- subset(df, Year>=Year1+1 & Year<=Year2)
# biological covariates
df.bio <- subset(greeklandings, Species=="Horse.mackerel")[,c("Year","log.metric.tons")]
df.bio <- subset(df.bio, Year>=Year1 & Year<=Year2-1)[,-1,drop=FALSE] # [,-1] to remove year
colnames(df.bio) <- "Trachurus"
# environmental covariates
ecovsmean.year[,"vwnd.m/s"]<- abs(ecovsmean.year[,"vwnd.m/s"])
df.env <- log(subset(ecovsmean.year, Year>=Year1 & Year<=Year2-1)[,-1])
# fishing effort
df.fish <- log(subset(greekfish.cov, Year>=Year1 & Year<=Year2-1)[,-1])
purse.cols <- stringr::str_detect(colnames(df.fish),"Purse.seiners")
df.fish <- df.fish[,purse.cols]
df.fish <- df.fish[!(colnames(df.fish)=="Purse.seiners.VAP")]
# assemble
df <- data.frame(
df, df.bio, df.env, df.fish
)
df$Year <- df$Year-df$Year[1]+1
colnames(df) <- sapply(colnames(df), function(x){rev(stringr::str_split(x,"Purse.seiners.")[[1]])[1]})
colnames(df) <- sapply(colnames(df), function(x){stringr::str_split(x,"[.]")[[1]][1]})
df <- df[,colnames(df)!="VAP"]
# all the data to 2007
df.full <- df
# only training data
df <- subset(df, Year>=1965-1964 & Year<=1987-1964)
save(df, df.full, file="MREG_Data.RData")