-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP18 trouble when pint is being wrapped #878
Comments
Most likely you're missing the |
@hameerabbasi Nope, I already tried that. |
(and even then, the default |
I'm glad to hear that at least pint wrapping sparse and pint wrapping dask.array work, since there are no tests for this (one of the main reasons I created #845). As a part of your investigations, do you happen to have tests for these already written? I was planning on doing so myself if I got the chance after the
As I've only recently gotten into the details of NEP 18 and I'm by far less experienced with all the libraries' internals, I will definitely defer to the consensus of others on this (and would like to hear @shoyer's thoughts). However, I think allowing dask.array wrapping pint and pint wrapping dask.array is a bad idea (xref #845 (comment)). This will make the type casting graph cyclic, which makes the type casting hierarchy ill-defined and the expected result of mixed-type operations ambiguous (xref pydata/xarray#525 (comment) and following comments for some discussion related to this). This would create big problems with non-commutativity and would complicate operations with scalars, among other issues. Based on past conversations I've seen (primarily in pydata/xarray#525), pint->dask seems to be the preferred order to allow unit math to occur at "graph construction time" rather than "runtime" (borrowing @shoyer's terminology from pydata/xarray#525 (comment)). I'd argue for this order as well, since it is almost a requirement for exploratory analysis of large datasets using unit-aware calculations (I'd want to keep track of units through intermediate steps of calculations, rather than just in the final computation). With this in mind, I think the larger task at hand is cleaning up xarray internals to allow xarray > pint > dask.array to work as expected, since as you pointed out this is currently a problem area. So, instead of fixing [2] by flipping around to [1], I would think [2] should be the target use case, and perhaps [1] should be flipped around or prohibited?
I suspect these might be problems with pint, since I can't shake the feeling that the current "accidental" support of dask.array and sparse in pint is error-prone. Perhaps a thorough set of tests could catch if there is some conversion to ndarray occurring internally in pint with whatever operations xarray uses during construction. |
@jthielen no I don't have unit tests; I just did an extremely brief manual experimentation. I agree that proper automated test suites are in order. |
Thanks for such detailed discussion. This is really useful. I would like to suggest 3 organization lines:
|
Perhaps it would be helpful to test things with a custom dask scheduler, to see what the culprit operation is? e.g., based on https://stackoverflow.com/questions/53289286/determine-how-many-times-dask-computed-something:
|
Or actually, I guess we need something wrapping pint's Quantity. I guess you could experiment by raising an error inside pint's |
Following this lead, I checked quick again and pint doesn't have an explicit To hack together a possible workaround, I added an explicit [1]
[2]
[3]
Also, no error was raised from a call to Overall, I think this points to |
905: NEP-18 Compatibility r=hgrecco a=jthielen Building off of the implementation of `__array_function__` in #764, this PR adds compatibility with NEP-18 in Pint (mostly in the sense of Quantity having `__array_function__` and being wrappable as a duck array; for Quantity wrapping other duck arrays, see #845). Many tests are added of NumPy functions being used with Pint Quantities by way of `__array_function__`. Accompanying changes that were needed as a part of this implementation include: - a complete refactor of `__array_ufunc__` and ufunc attribute fallbacks to work in parallel with `__array_function__` - promoting `_eq` in `quantity` to `eq` in `compat` - preliminary handling of array-like compatibility by defining upcast types and attempting to wrap and defer to all others (a follow-up PR, or set of PRs, will be needed to completely address #845 / #878) Closes #126 Closes #396 Closes #424 Closes #547 Closes #553 Closes #617 Closes #619 Closes #682 Closes #700 Closes #764 Closes #790 Closes #821 Co-authored-by: Jon Thielen <github@jont.cc>
FYI @shoyer @hameerabbasi @keewis
numpy 1.17, xarray/dask/sparse/pint git tip
NEP18 doesn't seem to work correctly in several cases.
I'm still in the process of investigating what causes the issue(s).
Works:
Broken:
[1] dask.array wraps around pint, and there are 2+ chunks
At first sight, the legitimacy of this use case is arguable, as it feels much cleaner to always have pint wrapping around dask.array (and it saves a few of headaches when dask.distributed and custom UnitRegistries get involved, too, as you never need to pickle your Quantities).
However, the problems of pint->dask and the benefits of dask->pint become clear when one wraps a pint+dask object in xarray.
There, with pint around dask, one would need to write special case handling for pretty much every piece of xarray logic that today has special case handling for dask - which is, a lot, whereas with dask around pint I would expect everything to work out of the box as long as NEP18 compliance is respected by all libraries.
@shoyer I'd like to hear your opinion on this...
[2] xarray wraps around pint which wraps around dask
Following the reasoning of [1], this should happen only when a user manually builds the data, as opposed to calling
xarray.Dataset.chunk()
- which should be rare-ish. I'm tempted to write a single piece of logic inxarray.Variable.data.setter
that detects the special pint->dask case and turns it around to dask->pint.[3] xarray wraps around pint which wraps around sparse
This looks to be the same as [2].
The text was updated successfully, but these errors were encountered: