-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up posterior predictive sampling #6208
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6208 +/- ##
==========================================
+ Coverage 93.58% 93.77% +0.19%
==========================================
Files 101 101
Lines 22136 22232 +96
==========================================
+ Hits 20716 20849 +133
+ Misses 1420 1383 -37
|
Got the proof of concept working with xarray-einstats and einops. Will write a simple xarray reshaper function to avoid the extra dependency. The reshape we need here is the simplest case supported by those. |
Needs arviz-devs/arviz#2138 to get all tests to pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OriolAbril . Could you tell me where the speed up is coming from?
stacked_dict = { | ||
vn: da.values.reshape((-1, *da.shape[num_sample_dims:])) for vn, da in ds.items() | ||
} | ||
points = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could yield instead of returning the whole list at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with using a lazy generator approach unless the whole list is needed at once for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that works later in the code then yes! I only kept the list because the function is called _to_list
. You should assume I have no idea about the format we need to interface with the aesara random drawing function.
Here would that be using a ()
comprehension or an explicit loop with a yield? Or either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ()
comprehensión more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the code downstream of this may be incompatible with generators (it asks for len and sometimes to check the first point...)
@OriolAbril can we split the speedup fix of this PR from the flexible |
I plan to release ArviZ on Saturday. In my case the limiting factor is not the ArviZ release but my own time availability. If I have to split the PR it will take longer. If you need that before feel free to split the PR. It doesn't matter if the functionality gets merged in 1 or 2 PRs |
Saturday should be fine. Let me know if there's anything I can help with otherwise. |
Addressed some of the comments but not all of them. I switched the list to generator in I also used a tuple for the sample dims, but arviz expects a list so it is either using a list from the start or using a tuple and converting it to a list later on. I don't really care either way but for now left the list from the start. I ran tests and mypy locally so I expect all tests to pass and the PR to be ready to merge after that. |
@OriolAbril Can you add a bullet point in the top post under the |
The goal of this PR is to accelerate the
dataset_to_point_list
function which right now isoften the bottleneck of posterior predictive sampling. Moreover, I would also like to add some
extra flexibility on the dimensions that are considered sample dimensions.
related to #5160
Checklist
Bugfixes / New features
sample_dims
argument tosample_posterior_predictive
.Docs / Maintenance
sample_posterior_predictive
when using InferenceData or Dataset as input.