Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[namedarray] split .set_dims() into .expand_dims() and broadcast_to() #8380

Merged
merged 71 commits into from
Jan 29, 2024

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Oct 26, 2023

this is the initial phase of our plan to split the .set_dims() method into two distinct methods .expand_dims() and .broadcast_to() as outlined in the design document.

  • add expand_dims()
  • add broadcast_to()
  • add .transpose() and .T to NamedArray
  • update whats-new.rst

@github-actions github-actions bot added the topic-NamedArray Lightweight version of Variable label Oct 26, 2023
xarray/namedarray/core.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
@andersy005 andersy005 marked this pull request as ready for review October 27, 2023 02:55
@andersy005 andersy005 requested a review from dcherian October 27, 2023 02:58
@andersy005 andersy005 changed the title add .set_dims(), .transpose() and .T to namedarray split .set_dims() into .expand_dims() and broadcast_to() Oct 27, 2023
xarray/namedarray/core.py Outdated Show resolved Hide resolved
doc/api-hidden.rst Outdated Show resolved Hide resolved
xarray/namedarray/core.py Show resolved Hide resolved
xarray/namedarray/core.py Outdated Show resolved Hide resolved
xarray/namedarray/core.py Outdated Show resolved Hide resolved
xarray/namedarray/core.py Outdated Show resolved Hide resolved
xarray/namedarray/utils.py Outdated Show resolved Hide resolved
xarray/namedarray/utils.py Outdated Show resolved Hide resolved
@Illviljan Illviljan marked this pull request as draft October 27, 2023 21:55
@dcherian
Copy link
Contributor

dcherian commented Dec 5, 2023

I have frequently unwrapped to access and compute on the underlying data container, which does indeed care about axis order.

@Illviljan
Copy link
Contributor

Designing an API around not using the class seems odd to me.

I agree that once you have retrieved the underlying data you will indeed have to deal with the axis order. But didn't you feel like you were hacking around a problem with xarray? That's what it sounds like to me.

This pattern is not realistic since we support any Hashable as a dimension name:

from numpy import array_api as nxp
import xarray as xr
from xarray.namedarray import _array_api as xrp

na = xr.namedarray.core.NamedArray(
    dims=("a", None, ("c",)), data=nxp.reshape(nxp.arange(4 * 4 * 4), (4, 4, 4))
)

na.expand_dims(x=-2)
na.expand_dims(100=-2) # Error, not a valid identifier, str(100).isidentifier() == False

Is this what you want then?

from numpy import array_api as nxp
import xarray as xr
from xarray.namedarray import _array_api as xrp

na = xr.namedarray.core.NamedArray(
    dims=("a", None, ("c",)), data=nxp.reshape(nxp.arange(4 * 4 * 4), (4, 4, 4))
)

na.expand_dims(dim=100, axis=-2)

@andersy005
Copy link
Member Author

andersy005 commented Dec 7, 2023

i'm revisiting this PR, and i've noticed that there is no clear agreement on the desired API spec for expand_dims(), is my observation accurate?

i understand that

na.expand_dims(100=-2)

# or 

na.expand_dims(None=0)

won;t work but the following works

na.expand_dims(dim={100:-2})

# or 

na.expand_dims(dim={None=0})

should we then remove support for dim_kwarg if we want to allow users to control the position of the new dimensions?

am i missing something?

@andersy005
Copy link
Member Author

andersy005 commented Dec 7, 2023

This pattern is not realistic since we support any Hashable as a dimension name:

from numpy import array_api as nxp
import xarray as xr
from xarray.namedarray import _array_api as xrp

na = xr.namedarray.core.NamedArray(
    dims=("a", None, ("c",)), data=nxp.reshape(nxp.arange(4 * 4 * 4), (4, 4, 4))
)

na.expand_dims(x=-2)
na.expand_dims(100=-2) # Error, not a valid identifier, str(100).isidentifier() == False

thinking more about this, we don't seem to restrict user from assigning hashables like (None, 100) to dims which results in the following edge cases that don't work when using keyword arguments like:

In [33]: v = xr.Variable(data=np.zeros(6).reshape(2, 3), dims=[100, None])

In [34]: v
Out[34]: 
<xarray.Variable (100: 2, None: 3)>
array([[0., 0., 0.],
       [0., 0., 0.]])

In [35]: v.sel(100=1)
  Cell In[35], line 1
    v.sel(100=1)
          ^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

so, i'm not sure we should drop dim_kwarg for expand_dims

na.expand_dims(b=2)

@dcherian
Copy link
Contributor

dcherian commented Dec 8, 2023

na.expand_dims(dim="a", axis=2) would be fine since we'd only support one at a time.

THis is a fairly minor point so lets just go with na.expand_dims(dim="a") and revisit if someone asks for axis support.

@Illviljan
Copy link
Contributor

thinking more about this, we don't seem to restrict user from assigning hashables like (None, 100) to dims which results in the following edge cases that don't work when using keyword arguments like:

In [33]: v = xr.Variable(data=np.zeros(6).reshape(2, 3), dims=[100, None])

In [34]: v
Out[34]: 
<xarray.Variable (100: 2, None: 3)>
array([[0., 0., 0.],
       [0., 0., 0.]])

In [35]: v.sel(100=1)
  Cell In[35], line 1
    v.sel(100=1)
          ^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?```

Yeah, I'm thinking v.sel(100=1) was a mistake considering xarray in general allows any Hashable as dimension name, a string like "2023_x" wont work either.
Either we significantly restrict the dimension names to strings passing .isidentifier() only or don't allow v.sel(100=1).

andersy005 and others added 12 commits January 22, 2024 22:55
simplified the `expand_dims` method of the NamedArray class by removing support for multiple and keyword arguments for new dimensions. the change updates the method signature to only accept a single dimension name, enhancing clarity and maintainability. this shift focuses on the common use case of adding a single dimension and moves extended functionality to a separate API.
@andersy005
Copy link
Member Author

@dcherian, when you get a chance, this is ready for another round of review

@dcherian dcherian merged commit 1ab02b4 into pydata:main Jan 29, 2024
28 of 29 checks passed
@andersy005 andersy005 deleted the add-set-dims branch January 29, 2024 19:25
andersy005 added a commit to TomNicholas/xarray that referenced this pull request Jan 30, 2024
* main: (153 commits)
  Add overloads to get_axis_num (pydata#8547)
  Fix CI: temporary pin pytest version to 7.4.* (pydata#8682)
  Bump the actions group with 1 update (pydata#8678)
  [namedarray] split `.set_dims()` into `.expand_dims()` and `broadcast_to()` (pydata#8380)
  Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` (pydata#8575)
  Fix NetCDF4 C version detection (pydata#8675)
  groupby: Don't set `method` by default on flox>=0.9 (pydata#8657)
  Fix automatic broadcasting when wrapping array api class (pydata#8669)
  Fix unstack method when wrapping array api class (pydata#8668)
  Fix `variables` arg typo in `Dataset.sortby()` docstring (pydata#8670)
  dt.weekday_name - removal of function (pydata#8664)
  Add `dev` dependencies to `pyproject.toml` (pydata#8661)
  CI: Pin scientific-python/upload-nightly-action to release sha (pydata#8662)
  Update HOW_TO_RELEASE.md by clarifying where RTD build can be found (pydata#8655)
  ruff: use extend-exclude (pydata#8649)
  new whats-new section (pydata#8652)
  xfail another test on windows (pydata#8648)
  use first element of residual in _nonpolyfit_1d (pydata#8647)
  whatsnew for v2024.01.1
  implement `isnull` using `full_like` instead of `zeros_like` (pydata#7395)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review topic-NamedArray Lightweight version of Variable
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants