Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamping HPD #1117

Merged
merged 24 commits into from
Apr 2, 2020
Merged

Revamping HPD #1117

merged 24 commits into from
Apr 2, 2020

Conversation

percygautam
Copy link
Contributor

@percygautam percygautam commented Mar 14, 2020

Description

fixes #855 Make hpd work with multidimensional arrays.

Checklist

  • Follows official PR format
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multimodal case is very tricky, I don't think it is possible to know the shape beforehand. i think the best way to tackle it would be to create the ufunc by hand and then use xr.apply_ufunc directly. This would avoid using make_ufunc that requires creating the output array beforehand.

However, I would first work on unimodal case and once unimodal is up and running, move onto multimodal.

arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
@percygautam
Copy link
Contributor Author

percygautam commented Mar 19, 2020

I have made the changes. It is working fine for ndarray and datasets. Currently, I am first converting ndarray to dataset then using wrap_xarray_ufunc.
I tried using wrap_xarray_ufuncdirectly on ndarray but the hpd is calculated is always calculated on the last dimension. Converting it to ndarray gives me control over which dimension to calculate hpd for.
Any workaround for this?

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using wrap_xarray_ufuncdirectly on ndarray but the hpd is calculated is always calculated on the last dimension. Converting it to ndarray gives me control over which dimension to calculate hpd for.
Any workaround for this?

I think this is the best workaround, it gives complete control and it is quite explicit. The other option would be to manually reorder ndarray dimensions, but I would not recommend it as it is much less readable.


I think that unimodal case is nearly finished, only things missing are tests and some nits. Regarding multimodal case, I was thinking of a possible workaround, let's see what everyone thinks about this. The idea would be to have _hpd_multimodal return a result with shape (2, 10) (the 10 is completely arbitrary and could be modified, even be an argument). Of these 10 pairs of values, the first would be hpd intervals and the last ones will probably not be needed, and would be then set to nan (in _hpd_multimodal). Eventually hpd in the case of multimodal would drop nan values and return the hpd dataset with the proper shape (also, different variables may have different number of modes and this approach should solve this too).

arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_stats.py Show resolved Hide resolved
arviz/stats/stats.py Show resolved Hide resolved
@percygautam percygautam changed the title Make hpd work with multidimensional arrays Revamping HPD Mar 20, 2020
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_stats.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_stats.py Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
@percygautam
Copy link
Contributor Author

Do we start the work of multimodal here or wait for everyone's opinion? Maybe create a new issue for multimodal discussion.

@codecov
Copy link

codecov bot commented Mar 22, 2020

Codecov Report

Merging #1117 into master will decrease coverage by 0.00%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1117      +/-   ##
==========================================
- Coverage   92.68%   92.67%   -0.01%     
==========================================
  Files          93       93              
  Lines        9073     9097      +24     
==========================================
+ Hits         8409     8431      +22     
- Misses        664      666       +2     
Impacted Files Coverage Δ
arviz/stats/stats.py 96.32% <89.47%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9909aab...b746b9b. Read the comment docs.

arviz/tests/base_tests/test_stats.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_stats.py Outdated Show resolved Hide resolved
@OriolAbril
Copy link
Member

I would work on multimodal in this same PR, otherwise, behaviour of ArviZ development version between merging this and the other will be quite confusing.

@percygautam
Copy link
Contributor Author

Regarding multimodal case, I was thinking of a possible workaround, let's see what everyone thinks about this. The idea would be to have _hpd_multimodal return a result with shape (2, 10) (the 10 is completely arbitrary and could be modified, even be an argument). Of these 10 pairs of values, the first would be hpd intervals and the last ones will probably not be needed, and would be then set to nan (in _hpd_multimodal). Eventually hpd in the case of multimodal would drop nan values and return the hpd dataset with the proper shape (also, different variables may have different number of modes and this approach should solve this too).

Should I start implementing this or wait?

@OriolAbril
Copy link
Member

Let's go ahead with multimodal 💪

@percygautam
Copy link
Contributor Author

I have tried to implement the multimodal case for a single input. Currently, I am filling only the first dimension with NaNs.
How do we decide the size (10) of the first dimension in this case?

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to comment everything on the related piece of code, but most of the comments are related one to the other and even apply to several places, read everything first and ask if there is anything unclear.

arviz/stats/stats_utils.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I have one question about ndarray input with ndims > 2. This is currently not supported right?

I am not sure if we should extend the 2d behaviour (as the code does now) or assume ArviZ dimensions of (chain, draw, *shape). I am inclined towards the second but we should probably weigh the pros and cons and reach some kind of consensus.

arviz/stats/stats.py Outdated Show resolved Hide resolved
arviz/stats/stats.py Outdated Show resolved Hide resolved
@percygautam
Copy link
Contributor Author

percygautam commented Mar 27, 2020

I think ndarray input with ndims > 2 is not supported.
P.S. can you check the linting error. I already popped out_shape, yet it's showing unexpected keyword.

@OriolAbril
Copy link
Member

You can add a comment to ignore the pylint warning, it probably misses the pop

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments will be the last nits

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved

density *= dx
if isinstance(ary, np.ndarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be only if the array is 1d or 2d:

isarray = isinstance(ary, np.ndarray)
if isarray and ary.ndim <= 2:

If the array has 3 or more dimensions, it should assume ArviZ dim order: (chain, draw, *shape). hpd should still return a numpy array though:

...
hpd_data = _wrap_xarray_ufunc(func, ary, func_kwargs=func_kwargs, **kwargs)
hpd_data = hpd_data.dropna("mode", how="all") if multimodal else hpd_data
return hpd_data.x.values if isarray else hpd_data

def test_hpd_multidimension():
normal_sample = np.random.randn(12000, 10, 3)
result = hpd(normal_sample)
assert result.shape == (10, 3, 2,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will have to be updated to check that the result shape is the desired (3, 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, we were calculating hpd over one dimension only, for ndarrays. So, for backward compatibility I have set default to be calculated only over 'chain' for ndarrays. So, the result still would be (10, 3, 2,).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that calculating hpd only over chain is a very bad default, we'll keep the behaviour (for now) in 2d array case to keep backwards compatibility, but 3d arrays are not supported, so we do not have the backwards compatibility constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have done the changes.

@aloctavodia aloctavodia merged commit aca28da into arviz-devs:master Apr 2, 2020
@percygautam percygautam deleted the hpd branch June 1, 2020 09:51
@mjhajharia
Copy link
Contributor

i ran some cell with pm.stats.hpd in an example notebook and ig its removed or something cause i get the error - no attribute 'hpd'. I tried az.hpd too, same thing. im probably missing something, is the function renamed or stuff.

@ahartikainen
Copy link
Contributor

Did you try az.hdi? Please open a new issue if that doesn't work.

@mjhajharia
Copy link
Contributor

@ahartikainen thankyou that worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected behavior when using pm.hpd with multidimensional (ndim>2)posterior arrays
5 participants