Closed
Description
I have written a function process_stacked_groupby
that stack all but one dimension of a dataset/dataarray and perform groupby-apply-combine
on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work.
MCVE Code Sample
import xarray as xr
# Dimensions
N = xr.DataArray(np.arange(100), dims='N', name='N')
reps = xr.DataArray(np.arange(5), dims='reps', name='reps')
horizon = xr.DataArray([1, -1], dims='horizon', name='horizon')
horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'}
vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical')
vertical.attrs = {'long_name': 'Vertical', 'units': 'V'}
# Variables
x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)),
dims=['N', 'reps', 'horizon', 'vertical'],
name='x')
y = x * 0.1
y.name = 'y'
# Merge x, y
data = xr.merge([x, y])
# Assign coords
data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon)
# Function that stack all but one diensions and groupby over the stacked dimension.
def process_stacked_groupby(ds, dim, func, *args):
# Function to apply to stacked groupby
def apply_fn(ds, dim, func, *args):
# Get groupby dim
groupby_dim = list(ds.dims)
groupby_dim.remove(dim)
groupby_var = ds[groupby_dim]
# Unstack groupby dim
ds2 = ds.unstack(groupby_dim).squeeze()
# perform function
ds3 = func(ds2, *args)
# Add mulit-index groupby_var to result
ds3 = (ds3
.reset_coords(drop=True)
.assign_coords(groupby_var)
.expand_dims(groupby_dim)
)
return ds3
# Get list of dimensions
groupby_dims = list(ds.dims)
# Remove dimension not grouped
groupby_dims.remove(dim)
# Stack all but one dimensions
stack_dim = '_'.join(groupby_dims)
ds2 = ds.stack({stack_dim: groupby_dims})
# Groupby and apply
ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args))
# Unstack
ds2 = ds2.unstack(stack_dim)
# Restore attrs
for dim in groupby_dims:
ds2[dim].attrs = ds[dim].attrs
return ds2
# Function to apply on groupby
def fn(ds):
return ds
# Run groupby with applied function
data.pipe(process_stacked_groupby, 'N', fn)
Expected Output
Prior to xarray=0.15.0, the above code produce a result that I wanted.
The function should be able to
- stack chosen dimensions
- groupby the stacked dimension
- apply a function on each group
a. The function actually passes along another function with unstacked group coord
b. Add multi-index stacked group coord back to the results of this function - combine the groups
- Unstack stacked dimension
Problem Description
After upgrading to 0.15.1, the above code stopped working.
The error occurred at the line
# Unstack
ds2 = ds2.unstack(stack_dim)
with ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']
.
This is on 5th step where the resulting combined object was found not to contain any multi-index.
Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension.
Versions
0.15.1