-
-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamping HPD #1117
Revamping HPD #1117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multimodal case is very tricky, I don't think it is possible to know the shape beforehand. i think the best way to tackle it would be to create the ufunc by hand and then use xr.apply_ufunc
directly. This would avoid using make_ufunc
that requires creating the output array beforehand.
However, I would first work on unimodal case and once unimodal is up and running, move onto multimodal.
I have made the changes. It is working fine for ndarray and datasets. Currently, I am first converting ndarray to dataset then using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using wrap_xarray_ufuncdirectly on ndarray but the hpd is calculated is always calculated on the last dimension. Converting it to ndarray gives me control over which dimension to calculate hpd for.
Any workaround for this?
I think this is the best workaround, it gives complete control and it is quite explicit. The other option would be to manually reorder ndarray dimensions, but I would not recommend it as it is much less readable.
I think that unimodal case is nearly finished, only things missing are tests and some nits. Regarding multimodal case, I was thinking of a possible workaround, let's see what everyone thinks about this. The idea would be to have _hpd_multimodal return a result with shape (2, 10)
(the 10 is completely arbitrary and could be modified, even be an argument). Of these 10 pairs of values, the first would be hpd intervals and the last ones will probably not be needed, and would be then set to nan (in _hpd_multimodal). Eventually hpd
in the case of multimodal would drop nan values and return the hpd dataset with the proper shape (also, different variables may have different number of modes and this approach should solve this too).
Do we start the work of multimodal here or wait for everyone's opinion? Maybe create a new issue for multimodal discussion. |
Codecov Report
@@ Coverage Diff @@
## master #1117 +/- ##
==========================================
- Coverage 92.68% 92.67% -0.01%
==========================================
Files 93 93
Lines 9073 9097 +24
==========================================
+ Hits 8409 8431 +22
- Misses 664 666 +2
Continue to review full report at Codecov.
|
I would work on multimodal in this same PR, otherwise, behaviour of ArviZ development version between merging this and the other will be quite confusing. |
Should I start implementing this or wait? |
Let's go ahead with multimodal 💪 |
I have tried to implement the multimodal case for a single input. Currently, I am filling only the first dimension with NaNs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to comment everything on the related piece of code, but most of the comments are related one to the other and even apply to several places, read everything first and ask if there is anything unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I have one question about ndarray input with ndims > 2. This is currently not supported right?
I am not sure if we should extend the 2d behaviour (as the code does now) or assume ArviZ dimensions of (chain, draw, *shape). I am inclined towards the second but we should probably weigh the pros and cons and reach some kind of consensus.
I think ndarray input with ndims > 2 is not supported. |
You can add a comment to ignore the pylint warning, it probably misses the pop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these comments will be the last nits
arviz/stats/stats.py
Outdated
|
||
density *= dx | ||
if isinstance(ary, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be only if the array is 1d or 2d:
isarray = isinstance(ary, np.ndarray)
if isarray and ary.ndim <= 2:
If the array has 3 or more dimensions, it should assume ArviZ dim order: (chain, draw, *shape)
. hpd should still return a numpy array though:
...
hpd_data = _wrap_xarray_ufunc(func, ary, func_kwargs=func_kwargs, **kwargs)
hpd_data = hpd_data.dropna("mode", how="all") if multimodal else hpd_data
return hpd_data.x.values if isarray else hpd_data
arviz/tests/base_tests/test_stats.py
Outdated
def test_hpd_multidimension(): | ||
normal_sample = np.random.randn(12000, 10, 3) | ||
result = hpd(normal_sample) | ||
assert result.shape == (10, 3, 2,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will have to be updated to check that the result shape is the desired (3, 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier, we were calculating hpd
over one dimension only, for ndarrays. So, for backward compatibility I have set default to be calculated only over 'chain' for ndarrays. So, the result still would be (10, 3, 2,).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that calculating hpd only over chain is a very bad default, we'll keep the behaviour (for now) in 2d array case to keep backwards compatibility, but 3d arrays are not supported, so we do not have the backwards compatibility constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have done the changes.
i ran some cell with pm.stats.hpd in an example notebook and ig its removed or something cause i get the error - no attribute 'hpd'. I tried az.hpd too, same thing. im probably missing something, is the function renamed or stuff. |
Did you try az.hdi? Please open a new issue if that doesn't work. |
@ahartikainen thankyou that worked! |
Description
fixes #855 Make hpd work with multidimensional arrays.
Checklist