-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor approach for making final bin right-edge inclusive #44
Refactor approach for making final bin right-edge inclusive #44
Conversation
You'll see that the test I added is failing. This is because There seem to be ways to get around this, but I'm looking for advice on best way to do this robustly. Should we only deal with I should point out, |
Codecov Report
@@ Coverage Diff @@
## master #44 +/- ##
===========================================
+ Coverage 84.08% 95.98% +11.90%
===========================================
Files 2 2
Lines 245 249 +4
Branches 74 71 -3
===========================================
+ Hits 206 239 +33
+ Misses 34 7 -27
+ Partials 5 3 -2
Continue to review full report at Codecov.
|
I think that because of our reliance on Despite this, I think the approach in this PR for including the right edge of the right bin is still better than the old approach and is maybe still worth merging. Thoughts @rabernat? |
Hi Dougie, sorry it has taken so long for me to review this. I am a bit under water right now. Maybe @TomNicholas could provide a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple and looks fine to me. But we should have a test to cover the change.
Thanks @rabernat, and all good - feeling pretty under water here too. This PR was motivated by #25, so the test I originally wrote was to compute the histogram of datetime objects. I've added this test back in, but you'll see it's failing. This is because
cc @TomNicholas |
I have no idea if that's possible or not, but @spencerkclark has done lots with datetimes and might be able to tell us before we put it in xarray?
I think this would be preferable really, and maybe not too difficult if we can copy
With the |
FYI, dask has added |
Yup, this comment was from before the |
We have fairly well-tested ways of converting datetimes to integers or floats in xarray, which you could use to work around this issue; however, it's a little unfortunate that it is required in this case. Checking for monotonicity -- and I think the |
Great, thanks for the advice @spencerkclark ! |
I think this is ready for review. @TomNicholas, would you mind taking a quick look? |
Description
Our current approach for making the last bin right-edge inclusive (to align with np.histogram definitions) is a hack stolen from scikit-learn - we simply add a very small increment to the last bin edge prior to digitizing.
Among other problems, this means that xhistogram currently fails for non-float bins (for example, binning over datetime objects, see #25).
This PR implements an alternative, hopefully better, approach.
Type of change
Testing
test_histogram_results_datetime()
totest_core.py
to check thatxhistogram
matchesnumpy.histogram
withnp.datetime64
data.