-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overloads to get_axis_num #8547
Conversation
I'm skeptical a = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=("x", "y"))
print(a.dims.index("y"))
1
print(a.get_axis_num("y"))
1 But that's a discussion for another PR. |
Good point! I guess the main advantage is the ability to pass tuples to Though the python error message for da.dims.index('foo')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 1
----> 1 da.dims.index('foo')
ValueError: tuple.index(x): x not in tuple |
When is it even required for a 3rd party user to use it? We of course have to use a helper like this internally, but when should a user need it?
I think it's fine? The value you tried to search for in the tuple wasn't there. Makes sense to me. |
Any time you need to get out of Xarray and pass an |
Is that what we want to encourage? It feels out of scope to me for xarray to also handle cases outside xarray-land. We have I mainly care about namedarray here. I think the method goes against the spirit of the package and don't think it should be one of the first methods to go public. Here's some funny little examples: # A typical Hashable:
a = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=("x", "y"))
a.dims.index("y")
1
a.get_axis_num("y")
1
# Another Hashable:
b = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(2023, None))
b.dims.index(None)
1
b.get_axis_num(None)
1
# And another Hashable:
c = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(("x", "y"), ("r",)))
c.dims.index(("r",))
1
c.get_axis_num(("r",)) # ValueError: 'r' not found in array dimensions (('x', 'y'), ('r',))
# And another Hashable:
d = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(b"x", b"y"))
d.dims.index(b"y")
1
d.get_axis_num(b"y") # ValueError: 121 not found in array dimensions (b'x', b'y') |
I guess we disagree a bit there — the value it gives ( Maybe we make private? I generally agree we shouldn't encourage it. I do think allowing tuples + giving a better error message are useful, though. Would also support inheriting from tuple and then just doing the |
80% of the scientists who use xarray regularly don't even know that Obviously we want users to be able to express everything they want to do in xarray as intuitively as possible, and we shouldn't encourage them to leave xarray, but I don't think we should actively make it harder for them to do so either. To give a concrete example - many operations in scipy etc. don't quite have the correct API to immediately work with weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.get_axis_num('x')) I think this is fine. It's a bit of a special/weird method, but I don't think we should get rid of it.
These are interesting, but to me represent issues with either (a) our handling of dimensions as hashables or (b) the current implementation of |
I think those issues also show that it can be better to just rely on python's own tuple methods instead. Here's how they could have done it instead in less code: weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.dims.index('x'))
I'm not really pushing to deprecate and remove. I just don't think it should be added to NamedArray as one of the first available, public methods. |
Except that (as @max-sixty noted above) using weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.dims.index(['x', 'y']))
We could override |
Should the input also be typed as |
Yup, the (few?) cases you would need multiple axis it would look like: axes = tuple(da.dims.index(v) for v in ('x', 'y'))
weird_analysis_func_expecting_numpy_arrays(da.data, axis=axes)
The issue is then the dim Things would have been so much more simple if dims was typed as |
Possibly we have strayed too far from the original issue, but one quick comment
I think this is OK! For code that wants to be more robust, passing |
Hmm, I find it inconsistent and a constant source for typing difficulties. |
IIRC, the issue with this is that tuples don't work — |
* main: (153 commits) Add overloads to get_axis_num (pydata#8547) Fix CI: temporary pin pytest version to 7.4.* (pydata#8682) Bump the actions group with 1 update (pydata#8678) [namedarray] split `.set_dims()` into `.expand_dims()` and `broadcast_to()` (pydata#8380) Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` (pydata#8575) Fix NetCDF4 C version detection (pydata#8675) groupby: Don't set `method` by default on flox>=0.9 (pydata#8657) Fix automatic broadcasting when wrapping array api class (pydata#8669) Fix unstack method when wrapping array api class (pydata#8668) Fix `variables` arg typo in `Dataset.sortby()` docstring (pydata#8670) dt.weekday_name - removal of function (pydata#8664) Add `dev` dependencies to `pyproject.toml` (pydata#8661) CI: Pin scientific-python/upload-nightly-action to release sha (pydata#8662) Update HOW_TO_RELEASE.md by clarifying where RTD build can be found (pydata#8655) ruff: use extend-exclude (pydata#8649) new whats-new section (pydata#8652) xfail another test on windows (pydata#8648) use first element of residual in _nonpolyfit_1d (pydata#8647) whatsnew for v2024.01.1 implement `isnull` using `full_like` instead of `zeros_like` (pydata#7395) ...
Add overloads to
.get_axis_num
because you will get the same type out as you put in.Seen in #8344.