-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
172 lines (129 loc) · 5.09 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#",
fig.path = "README-"
)
options(warnPartialMatchArgs = FALSE,
warnPartialMatchDollar = FALSE,
warnPartialMatchAttr = FALSE)
```
# glmmboot
<!-- badges: start -->
[![Travis build status](https://travis-ci.org/colmanhumphrey/glmmboot.svg?branch=master)](https://travis-ci.org/colmanhumphrey/glmmboot)
[![Codecov test coverage](https://codecov.io/gh/colmanhumphrey/glmmboot/branch/master/graph/badge.svg)](https://codecov.io/gh/colmanhumphrey/glmmboot?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/glmmboot)](https://cran.r-project.org/package=glmmboot)
<!-- badges: end -->
## Overview
glmmboot provides a simple interface for creating non-parametric bootstrap
confidence intervals using a wide set of models. The primary function
is `bootstrap_model`, which has three primary arguments:
* `base_model`: the model run on the full data as you normally would, prior to bootstrapping
* `base_data`: the dataset used
* `resamples`: how many bootstrap resamples you wish to perform
Another function, `bootstrap_ci`, converts output from
bootstrap model runs into confidence intervals and p-values.
By default, `bootstrap_model` calls `bootstrap_ci`.
## Types of bootstrapping
For models with random effects:
* the default (and recommended) behaviour will be to block sample over the effect with the largest entropy
(generally the one with the most levels)
* it's also possible to specify multiple random effects to block sample over
With no random effects, performs case resampling: resamples each row with replacement.
All of these are considered non-parametric.
## Requirements:
1. the model should work with the
function `update`, to change the data
2. the coefficients are extractable using `coef(summary(model))`
+ either directly, i.e. this gives a matrix
+ or it's a list of matrices; this includes e.g. zero-inflated models, which
produce two matrices of coefficients
## Parallel
It may be desired to run this package in parallel. The best way is
to use the `future` backend, which uses `future.apply::future_lapply`.
You do that by specifying the backend through the
`future::plan` setup, and then setting `parallelism = "future"`. It's quite
possible you'll want to pass the package used to build the model to the argument
`future_packages`. See the Quick Use
vignette for more.
It's also easy to use `parallel::mclapply`; again, see the Quick Use
vignette.
## Installation
glmmboot is on CRAN, so you can install it normally:
```{r, eval = FALSE}
install.packages("glmmboot")
```
Or the development version:
```{r gh-installation, eval = FALSE}
## install.packages("devtools")
devtools::install_github("ColmanHumphrey/glmmboot")
```
## Example: glm (no random effect)
We'll provide a quick example using glm. First we'll set up some data:
```{r}
set.seed(15278086)
x1 <- rnorm(50)
x2 <- runif(50)
expit <- function(x){exp(x) / (1 + exp(x))}
y_mean <- expit(0.2 - 0.3 * x1 + 0.4 * x2)
y <- rbinom(50, 1, prob = y_mean)
sample_frame <- data.frame(x1 = x1, x2 = x2, y = y)
```
Typically this model is fit with logistic regression:
```{r}
base_run <- glm(y ~ x1 + x2,
family = binomial(link = 'logit'),
data = sample_frame)
summary(base_run)
```
Let's run a bootstrap.
```{r}
library(glmmboot)
boot_results <- bootstrap_model(base_model = base_run,
base_data = sample_frame,
resamples = 999)
```
And the results:
```{r}
print(boot_results)
```
The estimates are the same, since we just pull from the base model. The intervals are
similar to the base model, although slightly narrower: typical logistic regression is fractionally
conservative at `N = 50`.
An example with a zero-inflated model (from the `glmmTMB` docs):
```{r, eval = requireNamespace("glmmTMB", quietly = TRUE)}
## we'll skip this if glmmTMB not available
library(glmmTMB)
owls <- transform(Owls,
nest = reorder(Nest, NegPerChick),
ncalls = SiblingNegotiation,
ft = FoodTreatment)
fit_zipoisson <- glmmTMB(
ncalls ~ (ft + ArrivalTime) * SexParent +
offset(log(BroodSize)) + (1 | nest),
data = owls,
ziformula = ~1,
family = poisson)
summary(fit_zipoisson)
```
Let's run the bootstrap (ignore the actual results, 3 resamples is basically meaningless - just for illustration):
```{r, eval = requireNamespace("glmmTMB", quietly = TRUE)}
zi_results <- bootstrap_model(base_model = fit_zipoisson,
base_data = owls,
resamples = 3)
print(zi_results)
```
We could also have run this with the `future.apply` backend:
```{r, eval = FALSE}
library(future.apply)
plan("multiprocess")
zi_results <- bootstrap_model(base_model = fit_zipoisson,
base_data = owls,
resamples = 1000,
parallelism = "future",
future_packages = "glmmTMB")
```