The mixpoissonreg package deals with regression models with response variables being count data.
It is aimed towards overdispersed data. It currently provides implementation of two regression models: Negative Binomial and Poisson Inverse Gaussian regression models. For both of these models there are two estimation procedures implemented: the EM-algorithm and direct maximization of the
log-likelihood function. The definition of both these regression models along with the EM-algorithm approach
for them can be found at Barreto-Souza and Simas (2016). The direct maximization
of the log-likelihood is only recommended when the EM-algorithm takes too long to converge.
Several global and local influence measures are implemented and easy to interpret through plot visualization. There are plot implementations
using base-R and ggplot2. For further details, and also more tutorials, we refer the reader to the vignettes whose links can be found below
or by using vignette(mixpoissonreg)
.
To install the CRAN version, run:
install.packages("mixpoissonreg")
The latest stable development version can be installed from github:
#install.packages("devtools")
devtools::install_github("vpnsctl/mixpoissonreg")
The mixpoissonreg package imports the Rfast
package. The Rfast
package needs the GSL
library to be installed. If you are using linux,
you most likely will need to install the GSL
library.
On Ubuntu (version 18.04 or newer), run:
sudo apt install libgsl-dev
On RedHat or Fedora, run:
yum install gsl gsl-devel
The mixpoissonreg package also imports the following packages:
pbapply
, Formula
, gamlss
, gamlss.dist
, rlang
, statmod
, lmtest
, generics
, magrittr
, tibble
, dplyr
, ggplot2
,
ggrepel
, gridExtra
To cite mixpoissonreg in publications use:
Simas, A.B. and Barreto-Souza, W. (2021). mixpoissonreg: Mixed Poisson Regression for Overdispersed Count Data. https://doi.org/10.5281/zenodo.4602320 R package version 1.0.0, https://vpnsctl.github.io/mixpoissonreg/
-
The mixpoissonreg home page contains the documentation (with examples), vignettes, etc.
-
The mixpoissonreg Documentation can be accessed directly through this link
-
Currently, the mixpoissonreg package has the following vignettes:
- Getting started: Introduction to the mixpoissonreg package
- Analyzing overdispersed count data with the mixpoissonreg package
- Maximum-likelihood estimation with the mixpoissonreg package
- Building and customizing base-R diagnostic plots with the mixpoissonreg package
- Building and customizing ggplot2-based diagnostic plots with the mixpoissonreg package
- Global and local influence analysis with the mixpoissonreg package
- Confidence and prediction intervals with the mixpoissonreg package
- mixpoissonreg in the tidyverse
-
Theoretical details can be obtained in Barreto-Souza and Simas (2016) and also in its Supplementary Material
The usage of the mixpoissonreg package is analogous to the usage of standard regression functions and packages in R:
library(mixpoissonreg)
fit <- mixpoissonreg(daysabs ~ prog + math + gender | prog, data = Attendance)
fit
#>
#> Negative Binomial Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ prog + math + gender | prog,
#> data = Attendance)
#>
#> Coefficients modeling the mean (with log link):
#> (Intercept) progAcademic progVocational math gendermale
#> 2.752418105 -0.424045918 -1.238680841 -0.006791582 -0.257120287
#> Coefficients modeling the precision (with log link):
#> (Intercept) progAcademic progVocational
#> 1.109533 -1.083443 -1.517825
summary(fit)
#>
#> Negative Binomial Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ prog + math + gender | prog,
#> data = Attendance)
#>
#>
#> Pearson residuals:
#> RSS Min 1Q Median 3Q Max
#> 323.6397 -1.0648 -0.7204 -0.3654 0.3064 4.8776
#>
#> Coefficients modeling the mean (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 2.752418 0.152613 18.035 < 2e-16 ***
#> progAcademic -0.424046 0.132369 -3.204 0.00136 **
#> progVocational -1.238681 0.173155 -7.154 8.45e-13 ***
#> math -0.006792 0.002274 -2.986 0.00283 **
#> gendermale -0.257120 0.116934 -2.199 0.02789 *
#>
#> Coefficients modeling the precision (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 1.1095 0.2783 3.987 6.68e-05 ***
#> progAcademic -1.0834 0.3074 -3.524 0.000424 ***
#> progVocational -1.5178 0.3395 -4.471 7.79e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Efron's pseudo R-squared: 0.1861217
#> Number of iterations of the EM algorithm = 1930
There are also several methods implemented:
- Visualization:
plot
(base-R visualization) ;autoplot
(ggplot2 visualization) ;local_influence_plot
(base-R visualization) ;local_influence_autoplot
(ggplot2 visualization) - Inference:
coeftest
;coefci
;lmtest::lrtest
(works with the default method) ;lmtest::waldtest
(works with the default method) ;predict
- Residual analysis:
residuals
- Global influence analysis:
influence
;cooks.distance
;hatvalues
- Local influence analysis:
local_influence
- Tidyverse compatibility:
augment
;glance
;tidy
;tidy_local_influence
;local_influence_benchmarks