Introduction

Many problems in environmental science can be addressed with state-space models and we’ve already seen a few examples of some simple models. This lab will introduce you to fitting univariate state-space models with the {MARSS} package. Notably, {MARSS} follows the same notation for defining state-space models as we would write on a whiteboard. Also, because {MARSS} was designed to fit mutlivariate state-space models, each of the model elements must be a matrix class. This also applies to scalar values such that they’re not numeric or integer.


Gompertz model

We’ll begin by simulating a true state of nature and then corrupting it with some observation error to produce the “data”. Specifically, we’ll use a discrete-time Gompertz model of density-dependent growth. Recall that the size of a population at time \((N_t)\) is given by

\[ N_t = N_{t-1} ~ \exp(u ~ + ~ (b - 1) \log(N_{t-1}))\exp(w _t) \]

Taking the log of both sides and substituting \(x_t\) for \(\log(N_t)\)

\[ \begin{align} \log(N_t) & = \log(N_{t-1}) + u + (b - 1) \log(N_{t-1}) + w_t \\ & \Downarrow \\ x_t &= x_{t-1} + u + (b - 1) x_{t-1} + w_t \\ &= x_{t-1} + u + b x_{t-1} - x_{t-1} + w_t \\ &= u + b x_{t-1} + w_t \end{align} \]

We generally assume the process errors are Gaussian, such that our full state model is our familiar AR(1) process

\[ x_t = u + b x_{t-1} + w_t \\ ~ \\ w_t \sim \text{N}(0, q) \]

Note: It’s very difficult to estimate both \(u\) and \(b\), so we’ll assume \(u\) = 0.

Initial state

So far we have largely ignored an important part of our model definition. Because our state model is an autoregressive process, we must define the initial state when \(t\) = 0 \((x_0)\). In theory, \(x_0\) is assumed to be a random effect with unknown mean and variance, such that

\[ x_0 \sim \text{N}(\pi,\lambda) \]

In practice, however, estimating both \(\pi\) and \(\lambda\) is nearly impossible, so the default is to treat \(x_0\) as a fixed (but unknown) effect, such that

\[ x_0 \sim \text{N}(\pi,0). \]


Simulate some data

Task: Use a for() loop to simulate from an AR(1) state model.

## number of time steps
TT <- 40

## strength of density-dependence (0 < b < 1)
bb <- 0.5

## time series of process errors with SD = 1
ww <- rnorm(TT, 0, 1)

## initialize state & set x0 = w0
xx <- ww

## loop over time steps
for(t in 2:TT) {
  xx[t] <- bb * xx[t-1] + ww[t]
}

Task: Add some observation error to the true state.

## obs errors with SD = 0.5
vv <- rnorm(TT, 0, 0.5)

## obs data
yy <- xx + vv

Fitting models with {MARSS}

The workhorse function in the {MARSS} package that allows us to fit state-state models is MARSS(), which has four important arguments (two are required):

MARSS(y, model = NULL, inits = NULL, control = NULL, ...)
  • y is an \(n \times T\) matrix of data (observations)

  • model is a list that defines the state-space model in terms of parameters to be estimated

  • inits [optional] is a list of initial values for parameters to be estimated

  • control [optional] is a list of options for controlling fitting algorithms, setting tolerance & convergence parameters, etc

Tip: Type ?MARSS at the command prompt to view the help file. You can also see the detailed user guide here.

Defining the model for MARSS()

MARSS() uses an expanded form of state-space model, such that our Gompertz model

\[ \begin{align} x_t &= b x_{t-1} + w_t \\ y_t &= x_t + v_t \end{align} \] instead becomes

\[ \begin{align} x_t &= b x_{t-1} + u + w_t \\ y_t &= Z x_t + a + v_t \end{align} \]

with \(u = 0\), \(Z = 1\), and \(a = 0\).

Tip: MARSS() works with the matrix class in R, so we have to define any scalar as a \(1 \times 1\) matrix.

Task: Define the model list with matrices for the known and estimated model parameters, including the variance(s) of the process and/or observation errors. You can skip the definition of the initial state x0 because MARSS() will do this for us.

mod_list <- list(
  ## state model
  B = matrix("b"), # AR(1) parameter 
  U = matrix(0),   # bias or intrinsic growth
  Q = matrix("q"), # variance of the process errors
  ## obs model
  Z = matrix(1),   # maps the states onto the observation
  A = matrix(0),   # sampling bias/offset
  R = matrix("r")  # variance of the obs errors
  )

Note: MARSS() uses uppercase letters in its model definition. You will get an error if you try lowercase letters.

Task: Define the data as a \(1\) (rows) \(\times\) \(T\) (cols) matrix.

## define the data as an N (rows) x T (cols) matrix
YY <- matrix(yy, nrow = 1, ncol = TT)

Task: Ignore the inits and control arguments for now and fit the model.

## load MARSS package
library(MARSS)

## fit the model
mod_fit <- MARSS(y = YY, model = mod_list)
## Success! algorithm run for 15 iterations. abstol and log-log tests passed.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Algorithm ran 15 (=minit) iterations and convergence was reached. 
## Log-likelihood: -56.66821 
## AIC: 121.3364   AICc: 122.4793   
##  
##       Estimate
## R.r      0.731
## B.b      0.443
## Q.q      0.229
## x0.x0    1.646
## Initial states (x0) defined at t=0
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.

Interpreting MARSS() output

MARSS() returns a block of information when it’s finished that contains some useful information, but also lacks some other stuff that we’ll extract below.

Tip: The first block of information tells us if all of the parameters converged within the default number of iterations (500) (Success!). It also suggests we try more conservative convergence limits, but this can largely be ignored.

## Success! abstol and log-log tests passed at 40 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.

Tip: The second block tells us

  • the method used for fitting (kem here)

  • the convergence test parameters

  • number of iterations before convergence

  • the log-likelihood and associated information scores

## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 40 iterations. 
## Log-likelihood: -57.95738 
## AIC: 123.9148   AICc: 125.0576   

Tip: The third block lists the point estimates of the model parameters corresponding to the model list (mod_list above), including the initial state x0. By default MARSS() appends the model list item (eg, Q) and the specific parameter as defined by the user (eg, q). When we fit multivariate models, there will be multiple entries for those list items containing multiple parameters (eg, Q.var, Q.cov).

##       Estimate
## R.r      0.287
## B.b      0.683
## Q.q      0.682
## x0.x0    1.233
## Initial states (x0) defined at t=0

Tip: The last block tells you that standard errors of the estimated parameters have not been estimated and that you can use the MARSSparamCIs() function to estimate them.

## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.

Extracting info from {MARSS}

Although MARSS() returns some useful information after execution, the MARSS object also contains a lot of other important information that we can extract and act upon. Specifically, when assigned to an object, MARSS() returns a long list containing details about the model structure, control parameters, estimated parameters, hidden states and their standard errors.

Tip: Type str(MARSS_object) where MARSS_object is the object you assigned to the MARSS() call (ie, mod_fit above) to see a detailed outline of the list structure.

Estimated states

Fitted values are stored in an \(N \times T\) matrix (here \(N = 1\)) accessed with $states. The standard errors of the estimated states are stored in an \(N \times T\) matrix accessed with $states.se.

Task: Extract the estimated state and its standard error.

Tip: It’s often easier to transpose the state(s) after extracting them for use in plotting or other comparisons.

## T x 1 (transposed) vector of states
mod_fits <- mod_fit$states |> t()

## T x 1 (transposed) vector of SE's
mod_fits_SE <- mod_fit$states.se |> t()

Model parameters

We can extract the parameters directly from a fitted MARSS model object rather than having to transcribe or copy/paste them from the output.

Tip: We can use the coef() function to extract \(b\) from the MARSS() fitted object.

Task: Use coef() to extract \(b\).

## get the estimated parameters
params_est <- coef(mod_fit, type = "vector")

## get `b` (strength of density dependence)
params_est["B.b"] |> round(2)
##  B.b 
## 0.44

Approximate confidence intervals

We can estimate a [(1 - \(\alpha\)) \(\times\) 100]% confidence interval around the estimated states as

\[ \hat{x} ~ \pm ~ t_{1 - \alpha/2} \times \text{SE}(\hat{x}) \]

Tip: You can get values from the t-distribution with the function qt(). Type ?qt for more information.

## upper 95% CI
mod_fits_CI_hi <- mod_fits + qt(p = 0.975, df = TT - 1) * mod_fits_SE

## lower 95% CI
mod_fits_CI_lo <- mod_fits - qt(p = 0.975, df = TT - 1) * mod_fits_SE

Plotting the states & SE

It’s common to plot the estimated state and some uncertainty about it (i.e., standard error, confidence interval), along with the data. Here are some examples using the base functions plot() and lines(). Similar plots could easily be made with ggplot() as well.

Model fit and true states

par(mai = c(1.2, 1, 0.3, 0), omi = c(0, 0, 0.5, 1))
plot.ts(xx, lwd = 2, type = "o", pch = 16, col = "#488fdf",
        las = 1, ylim = c(min(xx,yy), max(xx,yy)),
        ylab = expression(italic(x[t])~~or~~italic(y[t])))
# lines(yy, lwd = 2, type = "o", pch = 16, cex = 1.5, col = "#844870")
lines(mod_fits, lwd = 2, type = "l", cex = 1.5, col = "black")
lines(mod_fits_CI_hi, lwd = 2, type = "l", cex = 1.5, col = "gray")
lines(mod_fits_CI_lo, lwd = 2, type = "l", cex = 1.5, col = "gray")

Model fit with observations only

par(mai = c(1.2, 1, 0.3, 0), omi = c(0, 0, 0.5, 1))
plot.ts(yy, lwd = 2, type = "o", pch = 16, cex = 1.5, col = "#844870",
        las = 1, ylim = c(min(xx,yy), max(xx,yy)),
        ylab = expression(italic(x[t])~~or~~italic(y[t])))
lines(mod_fits, lwd = 2, type = "l", cex = 1.5, col = "black")
lines(mod_fits_CI_hi, lwd = 2, type = "l", cex = 1.5, col = "gray")
lines(mod_fits_CI_lo, lwd = 2, type = "l", cex = 1.5, col = "gray")


Model summaries

In addition to accessing the various aspects contained in the list of a fitted object of class marssMLE returned by MARSS(), you can use some summary functions to obtain parameter estimates and model fits. You can also get a set of standard plots of the model fits and diagnostics. Here are a few of the options:

  • coef(mod_fit) to get the estimated parameters

  • tidy(mod_fit) to get estimated parameters with CIs

  • fitted(mod_) to get the model estimates of mean y

  • plot(mod_fit) to view a series of base plots of the states, data, and diagnostics

  • ggplot2::autoplot(mod_fit) to view a series of ggplots of the states, data, and diagnostics


Model diagnostics

Just like other statistical models, state-space models come with their own set of assumptions. As such, it’s a good idea to examine some model diagnostics to see if the assumptions have been met. For the univariate state-space models we’re using here, the assumptions include the following:

  • The errors (\(w_t\) and \(v_t\)) are normally distributed

  • The errors (\(w_t\) and \(v_t\)) have constant variance

  • The errors (\(w_t\) and \(v_t\)) are independent

We can assess how well something fits a normal distribution via a q-q plot or qqplot(). We can use plots of the errors versus time to see if the variance changes. We can use the familiar autocorrelaction function acf() to assess whether the errors are independent over time.

Tip: The plot() and ggplot2::autoplot() functions produce useful diagnostic plots.

Task: Use either plot() or ggplot2::autoplot() to see how well our fitted model meets the assumptions.

## get summary & diagnostic plots
plot(mod_fit)

## plot type =  fitted.ytT  Observations with fitted values
## Hit <Return> to see next plot (q to exit):

## plot type = xtT Estimated states
## Hit <Return> to see next plot (q to exit):

## plot type = model.resids.ytt1
## Hit <Return> to see next plot (q to exit):

## plot type = std.model.resids.ytT
## Hit <Return> to see next plot (q to exit):

## plot type = std.state.resids.xtT
## Hit <Return> to see next plot (q to exit):

## plot type = qqplot.std.model.resids.ytt1
## Hit <Return> to see next plot (q to exit):

## plot type = qqplot.std.state.resids.xtT
## Hit <Return> to see next plot (q to exit):

## plot type = acf.std.model.resids.ytt1

Forecasting

Autoregressive models like the one we fit above can make good forecast models, especially if meaningful covariates can be included as well. The {MARSS} package uses a Kalman filtering algorithm for estimating model parameters, which generates three different forms of estimated states and observations for a discrete time series with \(t = 1:T\):

  1. prediction, conditioned on the data from \(1:(t - 1)\);

  2. filtering, conditioned on the data from \(1:t\);

  3. smoothing, conditioned on the data from \(1:T\).

By default, the MARSS() function produces the smoothed estimates (or “smoothations”), but in a forecasting context we want estimates conditioned on the data from \(1:(t - 1)\), not \(1:T\).

Tip: We can use the predict() function to generate forecasts and associated uncertainty intervals.

Task: Get the one-step ahead forecasts for the observed data based on our state-space model, along with the 90% prediction intervals.

## get forecasts and associated prediction interval
fY <- predict(mod_fit,                  # use the fitted model object
              level = 0.90,             # choose 90% for PI's
              type = "ytt1",            # shorthand for 1:(t-1)
              interval = "prediction")  # type of interval

Tip: We can use the plot() or ggplot2::autoplot() functions to plot forecasts and their associated uncertainty intervals along with the data.

## load ggplot
library(ggplot2)

## create plot
autoplot(fY, plot.par = list(theme = theme_minimal(), point.size = 2)) +
  ggtitle("Forecast (black line), prediction interval (gray area) & data (blue points)") +
  theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "inches"))