-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multidimensional histogram #5400
base: main
Are you sure you want to change the base?
Multidimensional histogram #5400
Conversation
Hello @TomNicholas! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-06-10 21:28:54 UTC |
|
||
Parameters | ||
---------- | ||
darray : DataArray | ||
Can have any number of dimensions. | ||
Can have any number of dimensions, but will be reduced to 1 dimension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you have plans for this already but will this handle 2d-histograms like the mean oxygen example in https://xhistogram.readthedocs.io/en/latest/tutorial.html#Averaging-a-variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be able to calculate 2D (or N-D in general) histograms in xarray.hist()
, it just won't be able to plot them in xarray.plot.hist()
, because you need something like an xarray.plot.imshow()
to plot the results of 2D histogram instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether you feel it is at all confusing to use the same word "hist" to refer both to the function that calculates the histogram and to the plotting function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. da.hist()
vs. da.plot.hist()
feels distinct enough for me.
I think I prefer the functions being called the proper name, da.histogram()
and da.plot.histogram()
, instead of the shorthand though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. da.hist() vs. da.plot.hist() feels distinct enough for me.
Cool.
I think I prefer the functions being called the proper name
That would also be marginally easier for the current users of xhistogram, who have been using xhistogram.histogram
so would only need to change the import.
However the two arguments I can see against that are:
- Deprecation cycle - I would like consistency between the calculation function and the plot function, but the plot function already exists and is so named because it wraps
matplotlib.pyplot.hist
. Changing that would require a deprecation cycle, but I think with the changes I'm proposing here we could get away without a deprecation cycle (because we're only adding optional arguments, not changing them). - We have many shorthand names for high-level functions already -
cov
,corr
,concat
, it would not be unusually terse.
I don't really have a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecation cycle wouldn't hinder your progress since I'm imagining simply just aliasing the method with a warning for a while.
numpy and dask also uses .histogram()
for their histograms. matplotlib, pandas and hvplot uses .hist()
for the plotting, holoviews uses .Histogram()
.
Maybe it's just easier to follow the upstream dependencies decisions then to avoid messing with the muscle memory (even though it's in my opinion "bad" muscle memory)?
And I'm not a fan of those shorthand names either, cov
id, corr
osion, interp
ret? But I'm fine with just following what upstream are doing, there's something satisfying when you can just jump between the different data structures and everything just works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make very good points! I think I agree with you now.
cov
id,corr
osion,interp
ret
🤣
It appears numpy has multiple discussions (here and here) about adding axis-aware histograms, and an axis argument was almost added to |
Not sure if it's any help but dask supports the numpy histogram functions now: dask/dask#7827 |
Initial work on integrating the multi-dimensional dask-powered histogram functionality from xhistogram into xarray. Just working on the skeleton to fit around the histogram algorithm for now, to be filled in later.
blockwise
algorithm from Refactor histogram to use blockwise xgcm/xhistogram#49plot.hist
da.weighted().hist()
pre-commit run --all-files
whats-new.rst
api.rst
np.datetime64
dtypes by refactoring to usenp.searchsorted
(for a future PR? See discussion)Question:
da.hist()
orda.histogram()
?