glmmboot provides a simple interface for creating non-parametric
bootstrap confidence intervals using a wide set of models. The primary
function is bootstrap_model
, which has three primary arguments:
base_model
: the model run on the full data as you normally would, prior to bootstrappingbase_data
: the dataset usedresamples
: how many bootstrap resamples you wish to perform
Another function, bootstrap_ci
, converts output from bootstrap model
runs into confidence intervals and p-values. By default,
bootstrap_model
calls bootstrap_ci
.
For models with random effects:
- the default (and recommended) behaviour will be to block sample over the effect with the largest entropy (generally the one with the most levels)
- it’s also possible to specify multiple random effects to block sample over
With no random effects, performs case resampling: resamples each row with replacement.
All of these are considered non-parametric.
- the model should work with the function
update
, to change the data - the coefficients are extractable using
coef(summary(model))
- either directly, i.e. this gives a matrix
- or it’s a list of matrices; this includes e.g. zero-inflated models, which produce two matrices of coefficients
It may be desired to run this package in parallel. The best way is to
use the future
backend, which uses future.apply::future_lapply
. You
do that by specifying the backend through the future::plan
setup, and
then setting parallelism = "future"
. It’s quite possible you’ll want
to pass the package used to build the model to the argument
future_packages
. See the Quick Use vignette for more.
It’s also easy to use parallel::mclapply
; again, see the Quick Use
vignette.
glmmboot is on CRAN, so you can install it normally:
install.packages("glmmboot")
Or the development version:
## install.packages("devtools")
devtools::install_github("ColmanHumphrey/glmmboot")
We’ll provide a quick example using glm. First we’ll set up some data:
set.seed(15278086)
x1 <- rnorm(50)
x2 <- runif(50)
expit <- function(x){exp(x) / (1 + exp(x))}
y_mean <- expit(0.2 - 0.3 * x1 + 0.4 * x2)
y <- rbinom(50, 1, prob = y_mean)
sample_frame <- data.frame(x1 = x1, x2 = x2, y = y)
Typically this model is fit with logistic regression:
base_run <- glm(y ~ x1 + x2,
family = binomial(link = 'logit'),
data = sample_frame)
summary(base_run)
#
# Call:
# glm(formula = y ~ x1 + x2, family = binomial(link = "logit"),
# data = sample_frame)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -1.6819 -1.2340 0.7048 0.9389 1.3213
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.1161 0.5890 -0.197 0.844
# x1 -0.5147 0.3387 -1.519 0.129
# x2 1.0933 1.0065 1.086 0.277
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 65.342 on 49 degrees of freedom
# Residual deviance: 61.944 on 47 degrees of freedom
# AIC: 67.944
#
# Number of Fisher Scoring iterations: 4
Let’s run a bootstrap.
library(glmmboot)
boot_results <- bootstrap_model(base_model = base_run,
base_data = sample_frame,
resamples = 999)
And the results:
print(boot_results)
# estimate boot 2.5% boot 97.5% boot p_value base p_value base 2.5%
# (Intercept) -0.1160896 -1.2295 0.9809 0.830 0.8446 -1.3010
# x1 -0.5146778 -1.1245 0.0455 0.076 0.1353 -1.1961
# x2 1.0932707 -0.7517 3.1328 0.284 0.2829 -0.9315
# base 97.5% boot/base width
# (Intercept) 1.0688 0.9327523
# x1 0.1667 0.8584962
# x2 3.1181 0.9592352
The estimates are the same, since we just pull from the base model. The
intervals are similar to the base model, although slightly narrower:
typical logistic regression is fractionally conservative at N = 50
.
An example with a zero-inflated model (from the glmmTMB
docs):
## we'll skip this if glmmTMB not available
library(glmmTMB)
owls <- transform(Owls,
nest = reorder(Nest, NegPerChick),
ncalls = SiblingNegotiation,
ft = FoodTreatment)
fit_zipoisson <- glmmTMB(
ncalls ~ (ft + ArrivalTime) * SexParent +
offset(log(BroodSize)) + (1 | nest),
data = owls,
ziformula = ~1,
family = poisson)
summary(fit_zipoisson)
# Family: poisson ( log )
# Formula:
# ncalls ~ (ft + ArrivalTime) * SexParent + offset(log(BroodSize)) +
# (1 | nest)
# Zero inflation: ~1
# Data: owls
#
# AIC BIC logLik deviance df.resid
# 4015.6 4050.8 -1999.8 3999.6 591
#
# Random effects:
#
# Conditional model:
# Groups Name Variance Std.Dev.
# nest (Intercept) 0.1294 0.3597
# Number of obs: 599, groups: nest, 27
#
# Conditional model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 2.53995 0.35656 7.123 1.05e-12 ***
# ftSatiated -0.29111 0.05961 -4.884 1.04e-06 ***
# ArrivalTime -0.06808 0.01427 -4.771 1.84e-06 ***
# SexParentMale 0.44885 0.45002 0.997 0.319
# ftSatiated:SexParentMale 0.10473 0.07286 1.437 0.151
# ArrivalTime:SexParentMale -0.02140 0.01835 -1.166 0.244
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Zero-inflation model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -1.05753 0.09412 -11.24 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s run the bootstrap (ignore the actual results, 3 resamples is basically meaningless - just for illustration):
zi_results <- bootstrap_model(base_model = fit_zipoisson,
base_data = owls,
resamples = 3)
print(zi_results)
# $cond
# estimate boot 2.5% boot 97.5% boot p_value
# (Intercept) 2.53994692 1.9197 2.9229 0.5
# ftSatiated -0.29110639 -0.3058 -0.1889 0.5
# ArrivalTime -0.06807809 -0.0866 -0.0392 0.5
# SexParentMale 0.44884508 0.1134 1.2690 0.5
# ftSatiated:SexParentMale 0.10472505 -0.1153 0.2804 1.0
# ArrivalTime:SexParentMale -0.02139750 -0.0527 -0.0087 0.5
# base p_value base 2.5% base 97.5% boot/base width
# (Intercept) 0.0000 1.8411 3.2388 0.7177368
# ftSatiated 0.0000 -0.4079 -0.1743 0.5002454
# ArrivalTime 0.0000 -0.0960 -0.0401 0.8479791
# SexParentMale 0.3186 -0.4332 1.3309 0.6550388
# ftSatiated:SexParentMale 0.1506 -0.0381 0.2475 1.3852712
# ArrivalTime:SexParentMale 0.2436 -0.0574 0.0146 0.6116518
#
# $zi
# estimate boot 2.5% boot 97.5% boot p_value base p_value base 2.5%
# (Intercept) -1.057534 -1.0575 -0.84 0.5 0 -1.242
# base 97.5% boot/base width
# (Intercept) -0.8731 0.5895082
We could also have run this with the future.apply
backend:
library(future.apply)
plan("multiprocess")
zi_results <- bootstrap_model(base_model = fit_zipoisson,
base_data = owls,
resamples = 1000,
parallelism = "future",
future_packages = "glmmTMB")