Many problems in environmental science can be addressed with
state-space models and we’ve already seen a few examples of some simple
models. This lab will introduce you to fitting univariate state-space
models with the {MARSS} package. Notably, {MARSS} follows the same
notation for defining state-space models as we would write on a
whiteboard. Also, because {MARSS} was designed to fit mutlivariate
state-space models, each of the model elements must be a
matrix class. This also applies to scalar values such that
they’re not numeric or integer.
We’ll begin by simulating a true state of nature and then corrupting it with some observation error to produce the “data”. Specifically, we’ll use a discrete-time Gompertz model of density-dependent growth. Recall that the size of a population at time \((N_t)\) is given by
\[ N_t = N_{t-1} ~ \exp(u ~ + ~ (b - 1) \log(N_{t-1}))\exp(w _t) \]
Taking the log of both sides and substituting \(x_t\) for \(\log(N_t)\)
\[ \begin{align} \log(N_t) & = \log(N_{t-1}) + u + (b - 1) \log(N_{t-1}) + w_t \\ & \Downarrow \\ x_t &= x_{t-1} + u + (b - 1) x_{t-1} + w_t \\ &= x_{t-1} + u + b x_{t-1} - x_{t-1} + w_t \\ &= u + b x_{t-1} + w_t \end{align} \]
We generally assume the process errors are Gaussian, such that our full state model is our familiar AR(1) process
\[ x_t = u + b x_{t-1} + w_t \\ ~ \\ w_t \sim \text{N}(0, q) \]
Note: It’s very difficult to estimate both \(u\) and \(b\), so we’ll assume \(u\) = 0.
So far we have largely ignored an important part of our model definition. Because our state model is an autoregressive process, we must define the initial state when \(t\) = 0 \((x_0)\). In theory, \(x_0\) is assumed to be a random effect with unknown mean and variance, such that
\[ x_0 \sim \text{N}(\pi,\lambda) \]
In practice, however, estimating both \(\pi\) and \(\lambda\) is nearly impossible, so the default is to treat \(x_0\) as a fixed (but unknown) effect, such that
\[ x_0 \sim \text{N}(\pi,0). \]
Task: Use a for() loop to simulate from
an AR(1) state model.
## number of time steps
TT <- 40
## strength of density-dependence (0 < b < 1)
bb <- 0.5
## time series of process errors with SD = 1
ww <- rnorm(TT, 0, 1)
## initialize state & set x0 = w0
xx <- ww
## loop over time steps
for(t in 2:TT) {
xx[t] <- bb * xx[t-1] + ww[t]
}
Task: Add some observation error to the true state.
## obs errors with SD = 0.5
vv <- rnorm(TT, 0, 0.5)
## obs data
yy <- xx + vv
The workhorse function in the {MARSS} package that allows us to fit
state-state models is MARSS(), which has four important
arguments (two are required):
MARSS(y, model = NULL, inits = NULL, control = NULL, ...)
y is an \(n \times
T\) matrix of data (observations)
model is a list that defines the
state-space model in terms of parameters to be estimated
inits [optional] is a list of
initial values for parameters to be estimated
control [optional] is a list
of options for controlling fitting algorithms, setting tolerance &
convergence parameters, etc
Tip: Type ?MARSS at the command prompt
to view the help file. You can also see the detailed user guide here.
MARSS()MARSS() uses an expanded form of state-space model, such
that our Gompertz model
\[ \begin{align} x_t &= b x_{t-1} + w_t \\ y_t &= x_t + v_t \end{align} \] instead becomes
\[ \begin{align} x_t &= b x_{t-1} + u + w_t \\ y_t &= Z x_t + a + v_t \end{align} \]
with \(u = 0\), \(Z = 1\), and \(a = 0\).
Tip: MARSS() works with the
matrix class in R, so we have to define any scalar as a
\(1 \times 1\) matrix.
Task: Define the model list with matrices for the
known and estimated model parameters, including the variance(s) of the
process and/or observation errors. You can skip the definition of the
initial state x0 because MARSS() will do this
for us.
mod_list <- list(
## state model
B = matrix("b"), # AR(1) parameter
U = matrix(0), # bias or intrinsic growth
Q = matrix("q"), # variance of the process errors
## obs model
Z = matrix(1), # maps the states onto the observation
A = matrix(0), # sampling bias/offset
R = matrix("r") # variance of the obs errors
)
Note: MARSS() uses uppercase letters in
its model definition. You will get an error if you try lowercase
letters.
Task: Define the data as a \(1\) (rows) \(\times\) \(T\) (cols) matrix.
## define the data as an N (rows) x T (cols) matrix
YY <- matrix(yy, nrow = 1, ncol = TT)
Task: Ignore the inits and
control arguments for now and fit the model.
## load MARSS package
library(MARSS)
## fit the model
mod_fit <- MARSS(y = YY, model = mod_list)
## Success! algorithm run for 15 iterations. abstol and log-log tests passed.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Algorithm ran 15 (=minit) iterations and convergence was reached.
## Log-likelihood: -56.66821
## AIC: 121.3364 AICc: 122.4793
##
## Estimate
## R.r 0.731
## B.b 0.443
## Q.q 0.229
## x0.x0 1.646
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
MARSS() outputMARSS() returns a block of information when it’s
finished that contains some useful information, but also lacks some
other stuff that we’ll extract below.
Tip: The first block of information tells us if all
of the parameters converged within the default number of iterations
(500) (Success!). It also suggests we try more conservative
convergence limits, but this can largely be ignored.
## Success! abstol and log-log tests passed at 40 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
Tip: The second block tells us
the method used for fitting (kem here)
the convergence test parameters
number of iterations before convergence
the log-likelihood and associated information scores
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 40 iterations.
## Log-likelihood: -57.95738
## AIC: 123.9148 AICc: 125.0576
Tip: The third block lists the point estimates of
the model parameters corresponding to the model list
(mod_list above), including the initial state
x0. By default MARSS() appends the model list
item (eg, Q) and the specific parameter as defined by the
user (eg, q). When we fit multivariate models, there will
be multiple entries for those list items containing multiple parameters
(eg, Q.var, Q.cov).
## Estimate
## R.r 0.287
## B.b 0.683
## Q.q 0.682
## x0.x0 1.233
## Initial states (x0) defined at t=0
Tip: The last block tells you that standard errors
of the estimated parameters have not been estimated and that you can use
the MARSSparamCIs() function to estimate them.
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Although MARSS() returns some useful information after
execution, the MARSS object also contains a lot of other
important information that we can extract and act upon. Specifically,
when assigned to an object, MARSS() returns a long
list containing details about the model structure, control
parameters, estimated parameters, hidden states and their standard
errors.
Tip: Type str(MARSS_object) where
MARSS_object is the object you assigned to the
MARSS() call (ie, mod_fit above) to see a
detailed outline of the list structure.
Fitted values are stored in an \(N \times
T\) matrix (here \(N = 1\))
accessed with $states. The standard errors of the estimated
states are stored in an \(N \times T\)
matrix accessed with $states.se.
Task: Extract the estimated state and its standard error.
Tip: It’s often easier to transpose the state(s) after extracting them for use in plotting or other comparisons.
## T x 1 (transposed) vector of states
mod_fits <- mod_fit$states |> t()
## T x 1 (transposed) vector of SE's
mod_fits_SE <- mod_fit$states.se |> t()
We can extract the parameters directly from a fitted MARSS model object rather than having to transcribe or copy/paste them from the output.
Tip: We can use the coef() function to
extract \(b\) from the
MARSS() fitted object.
Task: Use coef() to extract \(b\).
## get the estimated parameters
params_est <- coef(mod_fit, type = "vector")
## get `b` (strength of density dependence)
params_est["B.b"] |> round(2)
## B.b
## 0.44
We can estimate a [(1 - \(\alpha\)) \(\times\) 100]% confidence interval around the estimated states as
\[ \hat{x} ~ \pm ~ t_{1 - \alpha/2} \times \text{SE}(\hat{x}) \]
Tip: You can get values from the t-distribution with
the function qt(). Type ?qt for more
information.
## upper 95% CI
mod_fits_CI_hi <- mod_fits + qt(p = 0.975, df = TT - 1) * mod_fits_SE
## lower 95% CI
mod_fits_CI_lo <- mod_fits - qt(p = 0.975, df = TT - 1) * mod_fits_SE
It’s common to plot the estimated state and some uncertainty about it
(i.e., standard error, confidence interval), along with the data. Here
are some examples using the base functions plot() and
lines(). Similar plots could easily be made with
ggplot() as well.
par(mai = c(1.2, 1, 0.3, 0), omi = c(0, 0, 0.5, 1))
plot.ts(xx, lwd = 2, type = "o", pch = 16, col = "#488fdf",
las = 1, ylim = c(min(xx,yy), max(xx,yy)),
ylab = expression(italic(x[t])~~or~~italic(y[t])))
# lines(yy, lwd = 2, type = "o", pch = 16, cex = 1.5, col = "#844870")
lines(mod_fits, lwd = 2, type = "l", cex = 1.5, col = "black")
lines(mod_fits_CI_hi, lwd = 2, type = "l", cex = 1.5, col = "gray")
lines(mod_fits_CI_lo, lwd = 2, type = "l", cex = 1.5, col = "gray")
par(mai = c(1.2, 1, 0.3, 0), omi = c(0, 0, 0.5, 1))
plot.ts(yy, lwd = 2, type = "o", pch = 16, cex = 1.5, col = "#844870",
las = 1, ylim = c(min(xx,yy), max(xx,yy)),
ylab = expression(italic(x[t])~~or~~italic(y[t])))
lines(mod_fits, lwd = 2, type = "l", cex = 1.5, col = "black")
lines(mod_fits_CI_hi, lwd = 2, type = "l", cex = 1.5, col = "gray")
lines(mod_fits_CI_lo, lwd = 2, type = "l", cex = 1.5, col = "gray")
In addition to accessing the various aspects contained in the list of
a fitted object of class marssMLE returned by
MARSS(), you can use some summary functions to obtain
parameter estimates and model fits. You can also get a set of standard
plots of the model fits and diagnostics. Here are a few of the
options:
coef(mod_fit) to get the estimated
parameters
tidy(mod_fit) to get estimated parameters with
CIs
fitted(mod_) to get the model estimates of mean
y
plot(mod_fit) to view a series of base plots of the
states, data, and diagnostics
ggplot2::autoplot(mod_fit) to view a series of
ggplots of the states, data, and diagnostics
Just like other statistical models, state-space models come with their own set of assumptions. As such, it’s a good idea to examine some model diagnostics to see if the assumptions have been met. For the univariate state-space models we’re using here, the assumptions include the following:
The errors (\(w_t\) and \(v_t\)) are normally distributed
The errors (\(w_t\) and \(v_t\)) have constant variance
The errors (\(w_t\) and \(v_t\)) are independent
We can assess how well something fits a normal distribution via a q-q
plot or qqplot(). We can use plots of the errors versus
time to see if the variance changes. We can use the familiar
autocorrelaction function acf() to assess whether the
errors are independent over time.
Tip: The plot() and
ggplot2::autoplot() functions produce useful diagnostic
plots.
Task: Use either plot() or
ggplot2::autoplot() to see how well our fitted model meets
the assumptions.
## get summary & diagnostic plots
plot(mod_fit)
## plot type = fitted.ytT Observations with fitted values
## Hit <Return> to see next plot (q to exit):
## plot type = xtT Estimated states
## Hit <Return> to see next plot (q to exit):
## plot type = model.resids.ytt1
## Hit <Return> to see next plot (q to exit):
## plot type = std.model.resids.ytT
## Hit <Return> to see next plot (q to exit):
## plot type = std.state.resids.xtT
## Hit <Return> to see next plot (q to exit):
## plot type = qqplot.std.model.resids.ytt1
## Hit <Return> to see next plot (q to exit):
## plot type = qqplot.std.state.resids.xtT
## Hit <Return> to see next plot (q to exit):
## plot type = acf.std.model.resids.ytt1
Autoregressive models like the one we fit above can make good forecast models, especially if meaningful covariates can be included as well. The {MARSS} package uses a Kalman filtering algorithm for estimating model parameters, which generates three different forms of estimated states and observations for a discrete time series with \(t = 1:T\):
prediction, conditioned on the data from \(1:(t - 1)\);
filtering, conditioned on the data from \(1:t\);
smoothing, conditioned on the data from \(1:T\).
By default, the MARSS() function produces the smoothed
estimates (or “smoothations”), but in a forecasting context we want
estimates conditioned on the data from \(1:(t
- 1)\), not \(1:T\).
Tip: We can use the predict() function
to generate forecasts and associated uncertainty intervals.
Task: Get the one-step ahead forecasts for the observed data based on our state-space model, along with the 90% prediction intervals.
## get forecasts and associated prediction interval
fY <- predict(mod_fit, # use the fitted model object
level = 0.90, # choose 90% for PI's
type = "ytt1", # shorthand for 1:(t-1)
interval = "prediction") # type of interval
Tip: We can use the plot() or
ggplot2::autoplot() functions to plot forecasts and their
associated uncertainty intervals along with the data.
## load ggplot
library(ggplot2)
## create plot
autoplot(fY, plot.par = list(theme = theme_minimal(), point.size = 2)) +
ggtitle("Forecast (black line), prediction interval (gray area) & data (blue points)") +
theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "inches"))