-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
124 lines (91 loc) · 3.52 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# fsummary
<!-- badges: start -->
<!-- badges: end -->
The goal of fsummary is to compute posterior summaries of `draws_df` objects - FAST!
## Installation
You can install the development version of fsummary from [GitHub](https://github.com/) with:
``` r
remotes::install_github("andrewGhazi/fsummary")
```
## Example
This package's main function, `fsummary()`, computes the same thing as `posterior::summarise_draws()`. Here we run it on on a simulated draws data frame `ddf`:
```{r example, message=FALSE}
library(fsummary)
library(collapse)
library(posterior)
options(digits = 3)
set.seed(123)
n_chain = 4
n_iter = 1000
n_var = 1000
ddf = rnorm(n_chain * n_iter * n_var) |>
matrix(ncol = 1000) |>
qDT() |>
mtt(`.draw` = 1:(n_iter*n_chain),
`.iteration` = rep(1:n_iter, times = n_chain),
`.chain` = rep(1:n_chain, each = n_iter)) |>
posterior::as_draws_df()
fsummary(ddf) |> head()
```
On one core, it's about 3 times faster than `summarise_draws()` when computing convergence metrics and 5-6 times faster without. A couple quick tests on my machine:
```{r, eval = FALSE}
check_fun = function(x,y) {
check_res = waldo::compare(x, y,
tolerance = 1e-6, ignore_attr = TRUE)
length(check_res) == 0
}
bench::mark(fsummary = {fsummary(ddf)},
posterior = {summarise_draws(ddf)},
iterations = 10,
check = check_fun,
filter_gc = FALSE)
```
```
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 fsummary 1.68s 1.73s 0.549 1.2GB 6.10 10 111 18.21s
2 posterior 6.03s 6.27s 0.158 5.13GB 4.43 10 281 1.06m
```
```{r, eval = FALSE}
bench::mark(fsummary = {fsummary(ddf,
conv_metrics = FALSE)},
posterior = {summarise_draws(ddf,
default_summary_measures())},
iterations = 10,
check = check_fun,
filter_gc = FALSE)
```
```
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 fsummary 190.43ms 198.94ms 4.46 92.1MB 2.67 10 6 2.24s
2 posterior 1.09s 1.12s 0.822 759MB 4.85 10 59 12.17s
```
![](man/figures/comparison.png)
You can set up daemons with [`mirai`](https://shikokuchuo.net/mirai/index.html) for parallelization:
```{r eval=FALSE}
mirai::daemons(4)
system.time({fsummary(ddf)})
```
```
user system elapsed
0.018 0.012 1.227
```
(This helps more once you have summaries that take longer than a second!)
# TODO
* ✔ ~~parallelization with `mirai`~~
* faster convergence metrics with better ✔ ~~ranking~~ / qnorm (I've gotten the inverse normal transformation going twice as fast in Julia at least...)
* ✔ ~~Option for FFT autocovariance if user knows they have badly converged parameters~~
* Submission to CRAN. If you use this package and it would be helpful to you for it to be available through CRAN, submit an issue saying so. I've basically got it ready to submit, it's just not clear to me whether it would be worth doing the paperwork.