Skip to content

xarray=1.15.1 regression: Groupby drop multi-index #3985

Closed
@DancingQuanta

Description

@DancingQuanta

I have written a function process_stacked_groupby that stack all but one dimension of a dataset/dataarray and perform groupby-apply-combine on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work.

MCVE Code Sample

import xarray as xr

# Dimensions
N = xr.DataArray(np.arange(100), dims='N', name='N')
reps = xr.DataArray(np.arange(5), dims='reps', name='reps')
horizon = xr.DataArray([1, -1], dims='horizon', name='horizon')
horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'}
vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical')
vertical.attrs = {'long_name': 'Vertical', 'units': 'V'}

# Variables
x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)),
                 dims=['N', 'reps', 'horizon', 'vertical'],
                 name='x')
y = x * 0.1
y.name = 'y'

# Merge x, y
data = xr.merge([x, y])

# Assign coords
data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon)

# Function that stack all but one diensions and groupby over the stacked dimension.
def process_stacked_groupby(ds, dim, func, *args):
    
    # Function to apply to stacked groupby
    def apply_fn(ds, dim, func, *args):
        
        # Get groupby dim
        groupby_dim = list(ds.dims)
        groupby_dim.remove(dim)
        groupby_var = ds[groupby_dim]
        
        # Unstack groupby dim
        ds2 = ds.unstack(groupby_dim).squeeze()
        
        # perform function
        ds3 = func(ds2, *args)

        # Add mulit-index groupby_var to result
        ds3 = (ds3
               .reset_coords(drop=True)
               .assign_coords(groupby_var)
               .expand_dims(groupby_dim)
             )
        return ds3
    
    # Get list of dimensions
    groupby_dims = list(ds.dims)
    
    # Remove dimension not grouped
    groupby_dims.remove(dim)
    
    # Stack all but one dimensions
    stack_dim = '_'.join(groupby_dims)
    ds2 = ds.stack({stack_dim: groupby_dims})
    
    # Groupby and apply
    ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args))
    
    # Unstack
    ds2 = ds2.unstack(stack_dim)
    
    # Restore attrs
    for dim in groupby_dims:
        ds2[dim].attrs = ds[dim].attrs
    
    return ds2

# Function to apply on groupby
def fn(ds):
    return ds

# Run groupby with applied function
data.pipe(process_stacked_groupby, 'N', fn)

Expected Output

Prior to xarray=0.15.0, the above code produce a result that I wanted.

The function should be able to

  1. stack chosen dimensions
  2. groupby the stacked dimension
  3. apply a function on each group
    a. The function actually passes along another function with unstacked group coord
    b. Add multi-index stacked group coord back to the results of this function
  4. combine the groups
  5. Unstack stacked dimension

Problem Description

After upgrading to 0.15.1, the above code stopped working.
The error occurred at the line

    # Unstack
    ds2 = ds2.unstack(stack_dim)

with ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical'].
This is on 5th step where the resulting combined object was found not to contain any multi-index.
Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension.

Versions

0.15.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions