Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use existing package to handle distributions and transformations #39

Closed
joshwlambert opened this issue Oct 20, 2022 · 5 comments
Closed
Labels
enhancement New feature or request

Comments

@joshwlambert
Copy link
Member

The implementation of distribution data has been done by several packages (https://cran.r-project.org/web/views/Distributions.html) and the conversion functions that are shipped with them (https://pkg.mitchelloharawild.com/distributional/reference/index.html). I would be good to utilise one of these packages to minimise the dev load on the distribution side of the package and instead have most of the dev focus on epidemiological data storage and extraction.

@joshwlambert joshwlambert added the enhancement New feature or request label Oct 20, 2022
@Bisaloo
Copy link
Member

Bisaloo commented Oct 20, 2022

Some criteria I recommend to use in your choice:

  • number of recursive dependencies
  • number of reverse dependencies. A package already used by the community has more chances to be allotted dev time to remain on CRAN
  • package hosted on GitHub. It will be much easier to communicate with pkg authors and I generally take this as a commitment to engage with the community
  • devs with a good track record of being responsive and keeping their packages up-to-date

@joshwlambert
Copy link
Member Author

Thanks for the pointers, will keep them in mind when investigating which package to use.

@joshwlambert
Copy link
Member Author

Rationale for distribution package used in epiparameter

The four packages that I evaluated were:

  1. distr
  2. distr6
  3. distributional
  4. distributions3
distr distr6 distributional distributions3
no. direct dependencies 8 8 11 3
no. recursive dependencies 9 NA 38 37
no. reverse dependencies 26* NA 7 1
no. hosted on github
package up-to-date 2019 2022 2022 2022
on CRAN

*some of distr's reverse dependencies are other packages in the distr ecosystem

distr6 and distr are the most complete packages. Having the most distributions. However, they are based on the R6 and S4 object-oriented system in R and therefore may added unnecessary complexity for users that are new to R using these objects. distr6 has the most principled design philosophy and can be easily understood with the selection of vignettes. However, distr6 is no longer hosted on CRAN (it was archived 2022-08-20). distr has a complex hierachy of classes which will likely not be fully utilised in epiparameter as we will not be doing many transformations or need to apply arithmetic operations to multiple distributions. The documentation for distr is unconventional for an R package in that is does not have much documentation and only contains a single vignette with a large amount of information. distr is also part of a wider ecosystem of R packages developed by the same team (e.g. distrEx, distrMod, distrSim, etc.) which may mean that multiple dependencies are needed for full functionality.

This leaves distributional and distributions3. Both implement distributions as S3 objects, and as a result should be easily used by people new to R. The functionality is largely overlapping and the major difference between the packages is the use of vctrs by distributional to implement vectorised distribution objects. distributions3 has good documentation however, this is mainly focused on hypothesis testing and not on the distribution objects. distributional has good documentation at the function level, however, is lacking vignettes. distributions3 implements some zero-truncated distributions but does not allow for truncating an existing distribution object. On the other hand distributional allows truncation of existing objects across a wider range of distributions. The same goes for zero-inflated distributions, distributions3 implements certain distributions to have zero-inflated versions, whereas distributional allows zero-inflated probabilities to be applied to a range of distribution objects.

Other packages like extraDistr were not evaluated as they are mainly to implement uncommon distributions.

@Bisaloo
Copy link
Member

Bisaloo commented Nov 3, 2022

One extra good point for distributional is that it should soon be more lightweight: mitchelloharawild/distributional#62.

@joshwlambert
Copy link
Member Author

Since #85 {epiparameter} uses {distributional} and {distcrete} for handling distributions, therefore closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

2 participants