Skip to content

Errors when assigning using .from_pandas_multiindex #8455

Closed as not planned
Closed as not planned
@max-sixty

Description

@max-sixty

What happened?

Very possibly this is user-error, forgive me if so.

I'm trying to transition some code from the previous assignment of MultiIndexes, to the new world. Here's an MCVE:

What did you expect to happen?

No response

Minimal Complete Verifiable Example

da =  xr.tutorial.open_dataset("air_temperature")['air']

# old code, works, but with a warning

da.expand_dims('foo').assign_coords(foo=(pd.MultiIndex.from_tuples([(1,2)])))

<ipython-input-25-f09b7f52bb42>:1: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'foo' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
  da.expand_dims('foo').assign_coords(foo=(pd.MultiIndex.from_tuples([(1,2)])))
Out[25]:
<xarray.DataArray 'air' (foo: 1, time: 2920, lat: 25, lon: 53)>
array([[[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
          238.59999],
 ...
         [297.69   , 298.09   , 298.09   , ..., 296.49   , 296.19   ,
          295.69   ]]]], dtype=float32)
Coordinates:
  * lat          (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon          (lon) float32 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time         (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * foo          (foo) object MultiIndex
  * foo_level_0  (foo) int64 1
  * foo_level_1  (foo) int64 2

# new code — seems to get confused between the number of values in the index — 1 — and the number of levels — 3 including the parent:

da.expand_dims('foo').assign_coords(foo=xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 da.expand_dims('foo').assign_coords(foo=xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))

File ~/workspace/xarray/xarray/core/common.py:621, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    618 else:
    619     results = self._calc_assign_results(coords_combined)
--> 621 data.coords.update(results)
    622 return data

File ~/workspace/xarray/xarray/core/coordinates.py:566, in Coordinates.update(self, other)
    560 # special case for PandasMultiIndex: updating only its dimension coordinate
    561 # is still allowed but depreciated.
    562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
    563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
    564 self._drop_coords(self._names - coords_to_align._names)
--> 566 self._update_coords(coords, indexes)

File ~/workspace/xarray/xarray/core/coordinates.py:834, in DataArrayCoordinates._update_coords(self, coords, indexes)
    832 coords_plus_data = coords.copy()
    833 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 834 dims = calculate_dimensions(coords_plus_data)
    835 if not set(dims) <= set(self.dims):
    836     raise ValueError(
    837         "cannot add coordinates with new dimensions to a DataArray"
    838     )

File ~/workspace/xarray/xarray/core/variable.py:3014, in calculate_dimensions(variables)
   3012             last_used[dim] = k
   3013         elif dims[dim] != size:
-> 3014             raise ValueError(
   3015                 f"conflicting sizes for dimension {dim!r}: "
   3016                 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}"
   3017             )
   3018 return dims

ValueError: conflicting sizes for dimension 'foo': length 1 on <this-array> and length 3 on {'lat': 'lat', 'lon': 'lon', 'time': 'time', 'foo': 'foo'}

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.18 (main, Nov 2 2023, 16:51:22)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.10.2.dev10+gccc8f998
pandas: 2.1.1
numpy: 1.25.2
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.0
distributed: 2023.7.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: 0.2.3.dev30+gd26e29e
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: 0.9.19
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.0
mypy: 1.6.0
IPython: 8.15.0
sphinx: 4.3.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions