The goal of fsummary is to compute posterior summaries of draws_df
objects - FAST!
You can install the development version of fsummary from GitHub with:
remotes::install_github("andrewGhazi/fsummary")
This package’s main function, fsummary()
, computes the same thing as
posterior::summarise_draws()
. Here we run it on on a simulated draws
data frame ddf
:
library(fsummary)
library(collapse)
library(posterior)
options(digits = 3)
set.seed(123)
n_chain = 4
n_iter = 1000
n_var = 1000
ddf = rnorm(n_chain * n_iter * n_var) |>
matrix(ncol = 1000) |>
qDT() |>
mtt(`.draw` = 1:(n_iter*n_chain),
`.iteration` = rep(1:n_iter, times = n_chain),
`.chain` = rep(1:n_chain, each = n_iter)) |>
posterior::as_draws_df()
fsummary(ddf) |> head()
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: V1 0.00733 -0.004966 0.993 0.975 -1.65 1.64 1 4161 3930
#> 2: V2 -0.00628 -0.007577 0.996 1.007 -1.63 1.60 1 4099 3978
#> 3: V3 0.00578 -0.010096 1.006 1.021 -1.65 1.66 1 4005 3807
#> 4: V4 -0.00163 -0.006376 1.001 0.983 -1.64 1.68 1 3851 3729
#> 5: V5 -0.03390 -0.041806 1.004 1.008 -1.66 1.63 1 3878 3891
#> 6: V6 -0.00366 -0.000109 1.000 1.013 -1.66 1.66 1 3547 3824
On one core, it’s about 3 times faster than summarise_draws()
when
computing convergence metrics and 5-6 times faster without. A couple
quick tests on my machine:
check_fun = function(x,y) {
check_res = waldo::compare(x, y,
tolerance = 1e-6, ignore_attr = TRUE)
length(check_res) == 0
}
bench::mark(fsummary = {fsummary(ddf)},
posterior = {summarise_draws(ddf)},
iterations = 10,
check = check_fun,
filter_gc = FALSE)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 fsummary 1.68s 1.73s 0.549 1.2GB 6.10 10 111 18.21s
2 posterior 6.03s 6.27s 0.158 5.13GB 4.43 10 281 1.06m
bench::mark(fsummary = {fsummary(ddf,
conv_metrics = FALSE)},
posterior = {summarise_draws(ddf,
default_summary_measures())},
iterations = 10,
check = check_fun,
filter_gc = FALSE)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 fsummary 190.43ms 198.94ms 4.46 92.1MB 2.67 10 6 2.24s
2 posterior 1.09s 1.12s 0.822 759MB 4.85 10 59 12.17s
You can set up daemons with
mirai
for parallelization:
mirai::daemons(4)
system.time({fsummary(ddf)})
user system elapsed
0.018 0.012 1.227
(This helps more once you have summaries that take longer than a second!)
- ✔
parallelization withmirai
- faster convergence metrics with better ✔
ranking/ qnorm (I’ve gotten the inverse normal transformation going twice as fast in Julia at least…) - ✔
Option for FFT autocovariance if user knows they have badly converged parameters - Submission to CRAN. If you use this package and it would be helpful to you for it to be available through CRAN, submit an issue saying so. I’ve basically got it ready to submit, it’s just not clear to me whether it would be worth doing the paperwork.