README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#",
  fig.path = "README-"
)
options(warnPartialMatchArgs = FALSE,
        warnPartialMatchDollar = FALSE,
        warnPartialMatchAttr = FALSE)
```

# glmmboot

<!-- badges: start -->
[![Travis build status](https://travis-ci.org/colmanhumphrey/glmmboot.svg?branch=master)](https://travis-ci.org/colmanhumphrey/glmmboot)
[![Codecov test coverage](https://codecov.io/gh/colmanhumphrey/glmmboot/branch/master/graph/badge.svg)](https://codecov.io/gh/colmanhumphrey/glmmboot?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/glmmboot)](https://cran.r-project.org/package=glmmboot)
<!-- badges: end -->

## Overview

glmmboot provides a simple interface for creating non-parametric bootstrap
confidence intervals using a wide set of models. The primary function
is `bootstrap_model`, which has three primary arguments:

* `base_model`: the model run on the full data as you normally would, prior to bootstrapping
* `base_data`: the dataset used
* `resamples`: how many bootstrap resamples you wish to perform

Another function, `bootstrap_ci`, converts output from
bootstrap model runs into confidence intervals and p-values.
By default, `bootstrap_model` calls `bootstrap_ci`.

## Types of bootstrapping

For models with random effects:

* the default (and recommended) behaviour will be to block sample over the effect with the largest entropy
(generally the one with the most levels)
* it's also possible to specify multiple random effects to block sample over

With no random effects, performs case resampling: resamples each row with replacement.

All of these are considered non-parametric.

## Requirements:

1. the model should work with the
function `update`, to change the data
2. the coefficients are extractable using `coef(summary(model))`
  + either directly, i.e. this gives a matrix
  + or it's a list of matrices; this includes e.g. zero-inflated models, which
produce two matrices of coefficients

## Parallel

It may be desired to run this package in parallel. The best way is
to use the `future` backend, which uses `future.apply::future_lapply`.
You do that by specifying the backend through the
`future::plan` setup, and then setting `parallelism = "future"`. It's quite
possible you'll want to pass the package used to build the model to the argument
`future_packages`. See the Quick Use
vignette for more.

It's also easy to use `parallel::mclapply`; again, see the Quick Use
vignette.

## Installation

glmmboot is on CRAN, so you can install it normally:

```{r, eval = FALSE}
install.packages("glmmboot")
```

Or the development version:
```{r gh-installation, eval = FALSE}
## install.packages("devtools")
devtools::install_github("ColmanHumphrey/glmmboot")
```

## Example: glm (no random effect)

We'll provide a quick example using glm. First we'll set up some data:

```{r}
set.seed(15278086)
x1 <- rnorm(50)
x2 <- runif(50)

expit <- function(x){exp(x) / (1 + exp(x))}

y_mean <- expit(0.2 - 0.3 * x1 + 0.4 * x2)

y <- rbinom(50, 1, prob = y_mean)

sample_frame <- data.frame(x1 = x1, x2 = x2, y = y)
```

Typically this model is fit with logistic regression:

```{r}
base_run <- glm(y ~ x1 + x2,
                family = binomial(link = 'logit'),
                data = sample_frame)
summary(base_run)
```

Let's run a bootstrap.

```{r}
library(glmmboot)
boot_results <- bootstrap_model(base_model = base_run,
                                base_data = sample_frame,
                                resamples = 999)
```

And the results:
```{r}
print(boot_results)
```

The estimates are the same, since we just pull from the base model. The intervals are
similar to the base model, although slightly narrower: typical logistic regression is fractionally
conservative at `N = 50`.

An example with a zero-inflated model (from the `glmmTMB` docs):

```{r, eval = requireNamespace("glmmTMB", quietly = TRUE)}
## we'll skip this if glmmTMB not available
library(glmmTMB)

owls <- transform(Owls,
                  nest = reorder(Nest, NegPerChick),
                  ncalls = SiblingNegotiation,
                  ft = FoodTreatment)

fit_zipoisson <- glmmTMB(
    ncalls ~ (ft + ArrivalTime) * SexParent +
        offset(log(BroodSize)) + (1 | nest),
    data = owls,
    ziformula = ~1,
    family = poisson)

summary(fit_zipoisson)
```
Let's run the bootstrap (ignore the actual results, 3 resamples is basically meaningless - just for illustration):
```{r, eval = requireNamespace("glmmTMB", quietly = TRUE)}
zi_results <- bootstrap_model(base_model = fit_zipoisson,
                              base_data = owls,
                              resamples = 3)

print(zi_results)
```

We could also have run this with the `future.apply` backend:
```{r, eval = FALSE}
library(future.apply)
plan("multiprocess")

zi_results <- bootstrap_model(base_model = fit_zipoisson,
                              base_data = owls,
                              resamples = 1000,
                              parallelism = "future",
                              future_packages = "glmmTMB")
```