Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replacing {broom} and {broom.mixed} tidiers with {parameters} package to reduce no. of dependencies #152

Open
IndrajeetPatil opened this issue Feb 18, 2021 · 12 comments

Comments

@IndrajeetPatil
Copy link
Contributor

IndrajeetPatil commented Feb 18, 2021

Before making a PR related to this, I was wondering if you would be open to this. If you agree, I will open a PR.

rationale

parameters (https://easystats.github.io/parameters/) has way fewer dependencies and can handle pretty much every model that broom and broom.mixed combined support. It offers a number of other additional features not in broom (e.g., robust SEs, standardization, etc.)

dependency calculations

tools::package_dependencies(c("broom", "broom.mixed", "parameters"), recursive = TRUE)
#> $broom
#>  [1] "backports"    "dplyr"        "ellipsis"     "generics"     "glue"        
#>  [6] "methods"      "purrr"        "rlang"        "stringr"      "tibble"      
#> [11] "tidyr"        "ggplot2"      "lifecycle"    "magrittr"     "R6"          
#> [16] "tidyselect"   "utils"        "vctrs"        "pillar"       "digest"      
#> [21] "grDevices"    "grid"         "gtable"       "isoband"      "MASS"        
#> [26] "mgcv"         "scales"       "stats"        "withr"        "stringi"     
#> [31] "fansi"        "pkgconfig"    "cpp11"        "graphics"     "nlme"        
#> [36] "Matrix"       "splines"      "cli"          "crayon"       "utf8"        
#> [41] "farver"       "labeling"     "munsell"      "RColorBrewer" "viridisLite" 
#> [46] "tools"        "lattice"      "colorspace"  
#> 
#> $broom.mixed
#>  [1] "broom"        "coda"         "dplyr"        "methods"      "nlme"        
#>  [6] "purrr"        "stringr"      "tibble"       "tidyr"        "backports"   
#> [11] "ellipsis"     "generics"     "glue"         "rlang"        "ggplot2"     
#> [16] "lattice"      "lifecycle"    "magrittr"     "R6"           "tidyselect"  
#> [21] "utils"        "vctrs"        "pillar"       "graphics"     "stats"       
#> [26] "stringi"      "fansi"        "pkgconfig"    "cpp11"        "grDevices"   
#> [31] "digest"       "grid"         "gtable"       "isoband"      "MASS"        
#> [36] "mgcv"         "scales"       "withr"        "cli"          "crayon"      
#> [41] "utf8"         "tools"        "Matrix"       "splines"      "farver"      
#> [46] "labeling"     "munsell"      "RColorBrewer" "viridisLite"  "colorspace"  
#> 
#> $parameters
#> [1] "bayestestR" "datawizard" "insight"    "graphics"   "methods"   
#> [6] "stats"      "utils"

Created on 2021-11-03 by the reprex package (v2.0.1)

example with merMod

library(lme4)
#> Loading required package: Matrix
library(magrittr)
library(parameters)

lmer_mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

broom.mixed::tidy(lmer_mod, effects = "fixed")
#> # A tibble: 2 x 5
#>   effect term        estimate std.error statistic
#>   <chr>  <chr>          <dbl>     <dbl>     <dbl>
#> 1 fixed  (Intercept)    251.       6.82     36.8 
#> 2 fixed  Days            10.5      1.55      6.77

parameters::standardize_names(parameters::model_parameters(lmer_mod), style = "broom") %>%
  tibble::as_tibble()
#> # A tibble: 2 x 9
#>   term  estimate std.error conf.level conf.low conf.high statistic df.error
#>   <chr>    <dbl>     <dbl>      <dbl>    <dbl>     <dbl>     <dbl>    <int>
#> 1 (Int…    251.       6.82       0.95   238.       265.      36.8       174
#> 2 Days      10.5      1.55       0.95     7.44      13.5      6.77      174
#> # … with 1 more variable: p.value <dbl>

example with lm

lm_mod <- lm(Reaction ~ Days, sleepstudy)

broom::tidy(lm_mod)
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)    251.       6.61     38.0  2.16e-87
#> 2 Days            10.5      1.24      8.45 9.89e-15

parameters::standardize_names(parameters::model_parameters(lm_mod), style = "broom") %>%
  tibble::as_tibble()
#> # A tibble: 2 x 9
#>   term  estimate std.error conf.level conf.low conf.high statistic df.error
#>   <chr>    <dbl>     <dbl>      <dbl>    <dbl>     <dbl>     <dbl>    <int>
#> 1 (Int…    251.       6.61       0.95   238.       264.      38.0       178
#> 2 Days      10.5      1.24       0.95     8.02      12.9      8.45      178
#> # … with 1 more variable: p.value <dbl>

Created on 2021-02-18 by the reprex package (v1.0.0)

@datalorax
Copy link
Owner

I like the general idea but this would be a massive change and I'm not sure it's worth it. A lot of the current codebase depends on the output from broom looking exactly as it does now, so it would require considerable refactoring. For example, the lme4::lmer() code depends on having the effect column to delineate between fixed and random effects.

The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it. I've never really looked into parameters. It looks like it's pretty well maintained too. But it would still worry me a bit.

So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.

@IndrajeetPatil
Copy link
Contributor Author

For example, the lme4::lmer() code depends on having the effect column to delineate between fixed and random effects.

Hmm, that's a fair point. This is indeed a context where the parameters output won't exactly line up with the broom.mixed output, and this is a good enough reason to currently not make this switch.

The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it.

As someone who has contributed to both of these packages, I can vouch for the rigor and speed at which parameters is maintained (it is < 2 years old and already supports more models than broom and broom.mixed combined) and, in a few years, it will be as well-established as broom was at its age. 😉

So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.

We can revisit this when parameters starts to behave the same way as broom.mixed when it comes to random effects. Since then the switch would require minimal refactoring.

@datalorax
Copy link
Owner

Sounds good to me. Thanks.

@IndrajeetPatil

This comment has been minimized.

@datalorax
Copy link
Owner

Okay, I appreciate it. I'm hoping to come back to work on some bugs and things here in the next couple weeks. I suppose we could use the GitHub version as a dependency for now and then wait until they push to CRAN before our next release.

@IndrajeetPatil
Copy link
Contributor Author

Just wanted to post another reprex, this time with CRAN versions of both packages.

As far as I can see, there are just two (IMO) minor differences, but not sure how much difference it makes to your code:

  • random effects are called ran_pars in {broom}, while random in {parameters}
  • group column strings are surrounded in ""
library(lme4)
#> Loading required package: Matrix
library(broom.mixed)
library(tibble)
library(parameters)

options(tibble.width = Inf)

mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

# `broom.mixed` output --------------------------------

tidy(mod)
#> # A tibble: 6 x 6
#>   effect   group    term                  estimate std.error statistic
#>   <chr>    <chr>    <chr>                    <dbl>     <dbl>     <dbl>
#> 1 fixed    <NA>     (Intercept)           251.          6.82     36.8 
#> 2 fixed    <NA>     Days                   10.5         1.55      6.77
#> 3 ran_pars Subject  sd__(Intercept)        24.7        NA        NA   
#> 4 ran_pars Subject  cor__(Intercept).Days   0.0656     NA        NA   
#> 5 ran_pars Subject  sd__Days                5.92       NA        NA   
#> 6 ran_pars Residual sd__Observation        25.6        NA        NA

# `parameters` output ---------------------------------
# (with further modications to match `broom` conventions)

model_parameters(mod, effects = "all") %>%
  standardize_names(style = "broom") %>%
  as_tibble()
#> # A tibble: 6 x 11
#>   term                          estimate std.error conf.level conf.low conf.high
#>   <chr>                            <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
#> 1 (Intercept)                   251.          6.82       0.95   238.       265. 
#> 2 Days                           10.5         1.55       0.95     7.42      13.5
#> 3 SD (Intercept)                 24.7        NA          0.95    NA         NA  
#> 4 SD (Days)                       5.92       NA          0.95    NA         NA  
#> 5 Cor (Intercept~Days: Subject)   0.0656     NA          0.95    NA         NA  
#> 6 SD (Observations)              25.6        NA          0.95    NA         NA  
#>   statistic df.error   p.value effect group     
#>       <dbl>    <int>     <dbl> <chr>  <chr>     
#> 1     36.8       174  4.37e-84 fixed  ""        
#> 2      6.77      174  1.88e-10 fixed  ""        
#> 3     NA          NA NA        random "Subject" 
#> 4     NA          NA NA        random "Subject" 
#> 5     NA          NA NA        random "Subject" 
#> 6     NA          NA NA        random "Residual"

Created on 2021-11-03 by the reprex package (v2.0.1)

@datalorax
Copy link
Owner

Thanks. Just to be clear, the parameters package handles the models that broom and broom.mixed handle, correct?

@IndrajeetPatil
Copy link
Contributor Author

Yes, you can see the list of supported models using this function:

insight::supported_models()
#>   [1] "aareg"             "afex_aov"          "AKP"              
#>   [4] "Anova.mlm"         "aov"               "aovlist"          
#>   [7] "Arima"             "averaging"         "bamlss"           
#>  [10] "bamlss.frame"      "bayesQR"           "bayesx"           
#>  [13] "BBmm"              "BBreg"             "bcplm"            
#>  [16] "betamfx"           "betaor"            "betareg"          
#>  [19] "BFBayesFactor"     "bfsl"              "BGGM"             
#>  [22] "bife"              "bifeAPEs"          "bigglm"           
#>  [25] "biglm"             "blavaan"           "blrm"             
#>  [28] "bracl"             "brglm"             "brmsfit"          
#>  [31] "brmultinom"        "btergm"            "censReg"          
#>  [34] "cgam"              "cgamm"             "cglm"             
#>  [37] "clm"               "clm2"              "clmm"             
#>  [40] "clmm2"             "clogit"            "coeftest"         
#>  [43] "complmrob"         "confusionMatrix"   "coxme"            
#>  [46] "coxph"             "coxph.penal"       "coxr"             
#>  [49] "cpglm"             "cpglmm"            "crch"             
#>  [52] "crq"               "crqs"              "crr"              
#>  [55] "dep.effect"        "DirichletRegModel" "drc"              
#>  [58] "eglm"              "elm"               "epi.2by2"         
#>  [61] "ergm"              "feglm"             "feis"             
#>  [64] "felm"              "fitdistr"          "fixest"           
#>  [67] "flexsurvreg"       "gam"               "Gam"              
#>  [70] "gamlss"            "gamm"              "gamm4"            
#>  [73] "garch"             "gbm"               "gee"              
#>  [76] "geeglm"            "glht"              "glimML"           
#>  [79] "glm"               "Glm"               "glmm"             
#>  [82] "glmmadmb"          "glmmPQL"           "glmmTMB"          
#>  [85] "glmrob"            "glmRob"            "glmx"             
#>  [88] "gls"               "gmnl"              "HLfit"            
#>  [91] "htest"             "hurdle"            "iv_robust"        
#>  [94] "ivFixed"           "ivprobit"          "ivreg"            
#>  [97] "lavaan"            "lm"                "lm_robust"        
#> [100] "lme"               "lmerMod"           "lmerModLmerTest"  
#> [103] "lmodel2"           "lmrob"             "lmRob"            
#> [106] "logistf"           "logitmfx"          "logitor"          
#> [109] "LORgee"            "lqm"               "lqmm"             
#> [112] "lrm"               "manova"            "MANOVA"           
#> [115] "margins"           "maxLik"            "mclogit"          
#> [118] "mcmc"              "mcmc.list"         "MCMCglmm"         
#> [121] "mcp1"              "mcp12"             "mcp2"             
#> [124] "med1way"           "mediate"           "merMod"           
#> [127] "merModList"        "meta_bma"          "meta_fixed"       
#> [130] "meta_random"       "metaplus"          "mhurdle"          
#> [133] "mipo"              "mira"              "mixed"            
#> [136] "MixMod"            "mixor"             "mjoint"           
#> [139] "mle"               "mle2"              "mlm"              
#> [142] "mlogit"            "mmlogit"           "model_fit"        
#> [145] "multinom"          "mvord"             "negbinirr"        
#> [148] "negbinmfx"         "ols"               "onesampb"         
#> [151] "orm"               "pgmm"              "plm"              
#> [154] "PMCMR"             "poissonirr"        "poissonmfx"       
#> [157] "polr"              "probitmfx"         "psm"              
#> [160] "Rchoice"           "ridgelm"           "riskRegression"   
#> [163] "rjags"             "rlm"               "rlmerMod"         
#> [166] "RM"                "rma"               "rma.uni"          
#> [169] "robmixglm"         "robtab"            "rq"               
#> [172] "rqs"               "rqss"              "Sarlm"            
#> [175] "scam"              "selection"         "sem"              
#> [178] "SemiParBIV"        "semLm"             "semLme"           
#> [181] "slm"               "speedglm"          "speedlm"          
#> [184] "stanfit"           "stanmvreg"         "stanreg"          
#> [187] "summary.lm"        "survfit"           "survreg"          
#> [190] "svy_vglm"          "svyglm"            "svyolr"           
#> [193] "t1way"             "tobit"             "trimcibt"         
#> [196] "truncreg"          "vgam"              "vglm"             
#> [199] "wbgee"             "wblm"              "wbm"              
#> [202] "wmcpAKP"           "yuen"              "yuend"            
#> [205] "zcpglm"            "zeroinfl"          "zerotrunc"

Created on 2021-11-03 by the reprex package (v2.0.1)

@datalorax datalorax reopened this Nov 3, 2021
@datalorax
Copy link
Owner

Thanks, I'll play around with this in a bit.

@IndrajeetPatil
Copy link
Contributor Author

Cool!

The documentation can be found here: https://easystats.github.io/parameters/

@strengejacke
Copy link

strengejacke commented Nov 3, 2021

group column strings are surrounded in ""

Only in the printed output. That's because parameters uses an empty string in "group" for fixed effects, while broom.mixed uses NA. And for character columns, including empty strings, tibble adds a surrounding ".

@IndrajeetPatil IndrajeetPatil changed the title replacing broom tidiers with parameters to reduce no. of dependencies replacing {broom} and {broom.mixed} tidiers with {parameters} package to reduce no. of dependencies Nov 3, 2021
@McCartneyAC
Copy link

I would find this helpful--easystats is quickly becoming a huge part of my workflow and it would open up a huge number of classes to switch to {parameters} instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants