From e26790842c07fd918aead2e7cd5d2d0252fb547f Mon Sep 17 00:00:00 2001 From: DylanMG Date: Thu, 6 Jun 2024 08:39:16 -0700 Subject: [PATCH] update model supplement and gitignore so it shows up in repo --- .gitignore | 4 +- Supplement-model-check.Rmd | 69 +- Supplement-model-check.html | 1758 +++++++++++++++++++++++++++++++++++ 3 files changed, 1801 insertions(+), 30 deletions(-) create mode 100644 Supplement-model-check.html diff --git a/.gitignore b/.gitignore index 3bf49d6..2fff9af 100644 --- a/.gitignore +++ b/.gitignore @@ -18,11 +18,11 @@ resdoc.tex *.aux *.jpg -*.html +#*.html #to keep model supplement in GitHub - commented out are changed from CSAS defaults *.knit.md *.log *.pdf -#*.png +#*.png #to keep my figures in GitHub *.rda *.rds *.Rproj diff --git a/Supplement-model-check.Rmd b/Supplement-model-check.Rmd index 7182d70..1f22606 100644 --- a/Supplement-model-check.Rmd +++ b/Supplement-model-check.Rmd @@ -11,7 +11,7 @@ output: theme: cerulean toc: yes toc_float: yes - +bibliography: bib/refs.bib --- ```{r setup, echo = FALSE, message = FALSE, warning = FALSE} @@ -36,7 +36,7 @@ TV.model.pars <- rstan::extract(TV.stan.fit) ``` -This document describes fits from both state-space Ricker spawner-recruit models used fit in the Research Document *Estimating Precautionary Approach reference points and assessing consequences of harvest control rules for Fraser River Pink Salmon (Oncorhynchus gorbuscha)*. Two Ricker models were fit in this paper that serve different purposes. First, a Ricker model with autoregressive (AR1) recruitment residuals was fit to estimate biological benchmarks ($S_{gen}$, $S_{MSY}$, and $U_{MSY}$), then a model with time-varying productivity (Ricker $\alpha$ parameter) was fit in order to condition out biological submodel in out closed-loop forward simulation on recent population dynamics. Data and code to reproduce the analysis and report is available in this [GitHub repository](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/tree/main), where you can see the [R code](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/blob/main/analysis/R/fit-sr-stan.R) to run the models, and the [models](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/tree/main/analysis/Stan) themselves. +This document describes fits from both state-space Ricker spawner-recruit models used fit in the Research Document *Estimating Precautionary Approach reference points and assessing consequences of harvest control rules for Fraser River Pink Salmon (Oncorhynchus gorbuscha)*. Two Ricker models were fit in this paper that serve different purposes. First, a Ricker model with autoregressive (AR1) recruitment residuals was fit to estimate biological benchmarks ($S_{gen}$, $S_{MSY}$, and $U_{MSY}$), then a model with time-varying productivity (Ricker $\alpha$ parameter) was fit in order to condition the biological submodel on recent population dynamics in the closed-loop forward simulation. Data and code to reproduce the analysis and report is available in this [GitHub repository](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/tree/main), where you can see the [R code](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/blob/main/analysis/R/fit-sr-stan.R) to run the models, and the [models](https://github.com/Pacific-salmon-assess/FR-PK-ResDoc/tree/main/analysis/Stan) themselves. # Diagnostics @@ -47,17 +47,20 @@ We assessed chain convergence visually via trace plots and by ensuring that $\ha ## AR1 ### Trace plots -These should be clearly mixed, with no single distribution deviating from others (left column), and no chains exploring a strange space for a few iteraitons (right column). -```{r AR1-trace} +These should be clearly mixed, with no single distribution deviating from others (left column), and no chains exploring a strange space for a few iterations (right column). +```{r AR1-trace, fig.cap = "Density overlays (left) and trace plots (right) for leading parameters in the Ricker AR1 model."} mcmc_combo(AR1.stan.fit, pars = c("beta", "ln_alpha", "sigma_R_corr", "phi"), combo = c("dens_overlay", "trace"), gg_theme = legend_none()) ``` + These appear well mixed. ### $\hat{R}$ -These should all be less than 1.01. -```{r AR1-Rhat} + +We want our $\hat{R}$ values to be very close to 1 so we know our model converged on all parameters, so close that none should be less than 1.01. + +```{r AR1-Rhat, fig.cap = "R-hat values from the Ricker AR1 model."} ggplot(AR1.model.summary, aes(Rhat)) + geom_histogram() + labs(y = "frequency", @@ -68,9 +71,10 @@ ggplot(AR1.model.summary, aes(Rhat)) + ``` ### Effective sample size -We want these to be greater than 0.1, or 10% of the total iterations. -```{r AR1-ESS} +To know we are getting viable estimates, we want the effective sample size to be at least 10% of the total iterations. + +```{r AR1-ESS, fig.cap = "Effective sample size from the Ricker AR1 model."} ggplot(AR1.model.summary, aes(n_eff/2000)) + geom_histogram() + labs(y = "frequency", @@ -81,8 +85,10 @@ ggplot(AR1.model.summary, aes(n_eff/2000)) + ``` ### Posterior predictive check -Here we see if the parameters return data similar to ours by using random number generation in the `generated quantities` block of our Stan models. We want the random simulations (light blue lines), to fall near the distribution of our observed data (dark blue line). -```{r AR1-PPC} + +Here we see if the parameters return data similar to ours by using random number generation in the `generated quantities` block of our Stan models. We want the random simulations (light blue lines), to fall near the distribution of our observed data (dark blue line). + +```{r AR1-PPC, fig.cap = "Posterior predictive check from the Ricker AR1 model."} R <- (data$harvest+data$spawn)/1000000 #recruit data R_rep <- AR1.model.pars$H_rep[1:500,] + AR1.model.pars$S_rep[1:500,] #just too 500 random draws... @@ -95,14 +101,15 @@ ppc_dens_overlay(R, R_rep) + ## Time-varying productivity ### Trace plots -```{r TV-trace} + +```{r TV-trace, fig.cap = "Density overlays (left) and trace plots (right) for leading parameters in the time-varying productivity Ricker model."} mcmc_combo(TV.stan.fit, pars = c("beta", "ln_alpha0", "sigma_R", "sigma_alpha"), combo = c("dens_overlay", "trace"), gg_theme = legend_none()) ``` ### $\hat{R}$ -```{r TV-Rhat} +```{r TV-Rhat, fig.cap = "R-hat values from the time-varying productivity Ricker model."} ggplot(TV.model.summary, aes(Rhat)) + geom_histogram() + labs(y = "frequency", @@ -113,7 +120,7 @@ ggplot(TV.model.summary, aes(Rhat)) + ``` ### Effective sample size -```{r TV-ESS} +```{r TV-ESS, fig.cap = "Effective sample size from the time-varying productivity Ricker model."} ggplot(TV.model.summary, aes(n_eff/2000)) + geom_histogram() + labs(y = "frequency", @@ -124,7 +131,7 @@ ggplot(TV.model.summary, aes(n_eff/2000)) + ``` ### Posterior predictive check -```{r TV-PPC} +```{r TV-PPC, fig.cap = "Posterior predictive check from the time-varying productivity Ricker model."} R <- (data$harvest+data$spawn)/1000000 #recruit data R_rep <- TV.model.pars$H_rep[1:500,] + TV.model.pars$S_rep[1:500,] #just too 500 random draws... @@ -140,28 +147,34 @@ We already got an understanding of the posterior distributions by looking at how It can also be useful to look at the data, and predictions that were made in the observation model. -```{r data-obs} -ggplot() + - geom_ribbon(data = C_df, aes(x = year, ymin=lwr, ymax=upr), fill = "grey80") + - geom_ribbon(data = C_df, aes(x = year, ymin=mid_lwr, ymax=mid_upr), fill = "darkgrey") + - geom_line(data= C_df, aes(year, mid)) + - geom_point(data=data, aes(year, harvest/1000000), color = "red") + - theme_minimal() + - labs(title = "Catch", y = "observations", x = "year") - - +```{r data-Sobs, fig.cap = "Spawner data (red dots) and latent (i.e. hidden or estimated) states for spawners in the Ricker AR1 model, where vertical lines mark changes in sampling regimes."} ggplot() + geom_ribbon(data = S_df, aes(x = year, ymin=lwr, ymax=upr), fill = "grey80") + geom_ribbon(data = S_df, aes(x = year, ymin=mid_lwr, ymax=mid_upr), fill = "darkgrey") + geom_line(data= S_df, aes(year, mid)) + geom_point(data=data, aes(year, spawn/1000000), color = "red") + theme_minimal() + - geom_vline(xintercept=c(1961, 1991, 2001, 2007, 2009)) + - labs(title = "Spawners", y = "observations", x = "year") + geom_vline(xintercept=c(1962, 1992, 2002, 2008)) + + labs(title = "Spawner observations and hidden states", y = "observations", x = "year") +``` + + +```{r data-Cobs, fig.cap = "Catch data (red dots) and latent (i.e. hidden or estimated) states for spawners in the Ricker AR1 model."} +ggplot() + + geom_ribbon(data = C_df, aes(x = year, ymin=lwr, ymax=upr), fill = "grey80") + + geom_ribbon(data = C_df, aes(x = year, ymin=mid_lwr, ymax=mid_upr), fill = "darkgrey") + + geom_line(data= C_df, aes(year, mid)) + + geom_point(data=data, aes(year, harvest/1000000), color = "red") + + theme_minimal() + + labs(title = "Catch observations and hidden states", y = "observations", x = "year") ``` +Notice how the bands around catch are much tighter than the spawners, this is because we assumed a relatively narrow CV of 5% for this data, rather than spawner estimates that varied between 10 to 50% CVs. + ## Annual productivity -This figure shows deterministic fits of Ricker $\alpha$ from the time-varying model. It compliments Figure **REF** by showing what the relationships would look like each year. -```{r TV-a} + +```{r TV-a, fig.cap = "Deterministic fits from annual estimates of Ricker alpha parameter in the time-varying model."} knitr::include_graphics(here::here("figure/tv-SRR.png")) ``` + +# References diff --git a/Supplement-model-check.html b/Supplement-model-check.html new file mode 100644 index 0000000..2730030 --- /dev/null +++ b/Supplement-model-check.html @@ -0,0 +1,1758 @@ + + + + + + + + + + + + + + +Supplement: model fits and diagnostics + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +

This document describes fits from both state-space Ricker +spawner-recruit models used fit in the Research Document Estimating +Precautionary Approach reference points and assessing consequences of +harvest control rules for Fraser River Pink Salmon (Oncorhynchus +gorbuscha). Two Ricker models were fit in this paper that serve +different purposes. First, a Ricker model with autoregressive (AR1) +recruitment residuals was fit to estimate biological benchmarks (\(S_{gen}\), \(S_{MSY}\), and \(U_{MSY}\)), then a model with time-varying +productivity (Ricker \(\alpha\) +parameter) was fit in order to condition the biological submodel on +recent population dynamics in the closed-loop forward simulation. Data +and code to reproduce the analysis and report is available in this GitHub +repository, where you can see the R +code to run the models, and the models +themselves.

+
+

1 Diagnostics

+

We fit the spawner-recruitment model in a Bayesian estimation +framework with Stan (Carpenter et al. 2017; Stan +Development Team 2023), which implements the No-U-Turn +Hamiltonian Markov chain Monte Carlo algorithm (Hoffman and Gelman 2014) for Bayesian +statistical inference to generate a joint posterior probability +distribution of all unknowns in the model. We sampled from 4 chains with +2,000 iterations each and discarded the first half as warm-up. We +assessed chain convergence visually via trace plots and by ensuring that +\(\hat{R}\) (potential scale reduction +factor; Vehtari et al. (2021)) was less +than 1.01 and that the effective sample size was greater than 400. +Posterior predictive checks were used to make sure the model returned +known values, by simulating new datasets and checking how similar they +were to our observed data.

+
+

1.1 AR1

+
+

1.1.1 Trace plots

+These should be clearly mixed, with no single distribution deviating +from others (left column), and no chains exploring a strange space for a +few iterations (right column).
+ +
+Density overlays (left) and trace plots (right) for leading parameters in the Ricker AR1 model. +

+Density overlays (left) and trace plots (right) for leading parameters +in the Ricker AR1 model. +

+
+

These appear well mixed.

+
+
+

1.1.2 \(\hat{R}\)

+

We want our \(\hat{R}\) values to be +very close to 1 so we know our model converged on all parameters, so +close that none should be less than 1.01.

+
+R-hat values from the Ricker AR1 model. +

+R-hat values from the Ricker AR1 model. +

+
+
+
+

1.1.3 Effective sample +size

+

To know we are getting viable estimates, we want the effective sample +size to be at least 10% of the total iterations.

+
+Effective sample size from the Ricker AR1 model. +

+Effective sample size from the Ricker AR1 model. +

+
+
+
+

1.1.4 Posterior +predictive check

+

Here we see if the parameters return data similar to ours by using +random number generation in the generated quantities block +of our Stan models. We want the random simulations (light blue lines), +to fall near the distribution of our observed data (dark blue line).

+
+Posterior predictive check from the Ricker AR1 model. +

+Posterior predictive check from the Ricker AR1 model. +

+
+
+
+
+

1.2 Time-varying +productivity

+
+

1.2.1 Trace plots

+
+Density overlays (left) and trace plots (right) for leading parameters in the time-varying productivity Ricker model. +

+Density overlays (left) and trace plots (right) for leading parameters +in the time-varying productivity Ricker model. +

+
+
+
+

1.2.2 \(\hat{R}\)

+
+R-hat values from the time-varying productivity Ricker model. +

+R-hat values from the time-varying productivity Ricker model. +

+
+
+
+

1.2.3 Effective sample +size

+
+Effective sample size from the time-varying productivity Ricker model. +

+Effective sample size from the time-varying productivity Ricker model. +

+
+
+
+

1.2.4 Posterior +predictive check

+
+Posterior predictive check from the time-varying productivity Ricker model. +

+Posterior predictive check from the time-varying productivity Ricker +model. +

+
+
+
+
+
+

2 Fits

+

We already got an understanding of the posterior distributions by +looking at how chains mixed above, and if you want to look at ALL the +posteriors from each model you can run the lines with the +launch_shinystan() function in the R +code.

+

It can also be useful to look at the data, and predictions that were +made in the observation model.

+
+Spawner data (red dots) and latent (i.e. hidden or estimated) states for spawners in the Ricker AR1 model, where vertical lines mark changes in sampling regimes. +

+Spawner data (red dots) and latent (i.e. hidden or estimated) states for +spawners in the Ricker AR1 model, where vertical lines mark changes in +sampling regimes. +

+
+
+Catch data (red dots) and latent (i.e. hidden or estimated) states for spawners in the Ricker AR1 model. +

+Catch data (red dots) and latent (i.e. hidden or estimated) states for +spawners in the Ricker AR1 model. +

+
+

Notice how the bands around catch are much tighter than the spawners, +this is because we assumed a relatively narrow CV of 5% for this data, +rather than spawner estimates that varied between 10 to 50% CVs.

+
+

2.1 Annual +productivity

+
+Deterministic fits from annual estimates of Ricker alpha parameter in the time-varying model. +

+Deterministic fits from annual estimates of Ricker alpha parameter in +the time-varying model. +

+
+
+
+
+

References

+
+
+Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben +Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, +and Allen Riddell. 2017. “Stan: A Probabilistic +Programming Language.” Journal of Statistical Software +76 (1). https://doi.org/10.18637/jss.v076.i01. +
+
+Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn +Sampler: Adaptively Setting Path Lengths in Hamiltonian +Monte Carlo.” Journal of Machine Learning +Research 15: 1593–623. https://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf. +
+
+Stan Development Team. 2023. “Rstan: The R Interface +to Stan.” https://mc-stan.org/. +
+
+Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and +Paul-Christian Burkner. 2021. “Rank-Normalization, Folding, and +Localization: An Improved R-Hat for Assessing +Convergence of MCMC (with Discussion).” +Bayesian Analysis 16 (2): 667–718. https://projecteuclid.org/journals/bayesian-analysis/volume-16/issue-2/Rank-Normalization-Folding-and-Localization--An-Improved-R%cb%86-for/10.1214/20-BA1221.full. +
+
+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + +