-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with MultiIndexes #743
Comments
Hi! So the issue arise because we are trying to stack an already stacked array, we didn't think of this use case! I know this is just a MWE, but was your goal to stack multiple dims into what the spatial analogs algo uses for the different distributions / samples? ds = xr.tutorial.open_dataset('air_temperature').resample(time='D').mean(keep_attrs=True)
ds = ds.mean(['lon'], keep_attrs=True)
ks = xclim.analog.spatial_analogs(
target=ds,
candidates=ds,
dist_dim=['time'],
method='kolmogorov_smirnov'
) However, if the problem is that your data, for a reason or another, arrives with a MultiIndex, then we'll have to find a workaround! |
Ah, that makes sense! Our real application (rather than this MWE) is a comparison between observational data and forecast data. The observational data has a time axis (e.g. time, lat, lon), while temporal information in the forecast data is represented by initial_date and lead_time axes (e.g. initial_date, lead_time, lat, lon, ensemble_member). For the KS test we stack initial_date, lead_time and ensemble into a sample dimension, e.g. forecast.stack({'sample': ['ensemble_member', 'init_date', 'lead_time']}) and then create a similar sample dimension for the observational data: obs.stack({'sample': ['time']}) We could massage the obs into a initial_date / lead_time / ensemble_member format, but those axes won't be the same length as for the forecast data (which means |
I see! Then I think we could fix this in xclim: if isinstance(dist_dim, str):
# rename to "dist" instead of stacking
else:
# Current behaviour : stack to "dist". ( fc = forecast.stack({'sample': ['ensemble_member', 'init_date', 'lead_time']})
fc['sample'] = np.arange(fc_crd.size)
out = spatial_analogs(obs.rename(time='sample'), fc, dist_dim='sample') ) |
That workaround would work... fc = forecast.stack({'sample': ['ensemble_member', 'init_date', 'lead_time']})
fc['sample'] = np.arange(fc['sample'].size)
ob = obs.rename(time='sample')
ob['sample'] = np.arange(ob['sample'].size) ... but only if the length of the time axis in the observations is exactly the same length as the stacked ensemble_member/init_date/lead_time axis in the forecast data (otherwise you get a |
Oh that's a real problem, intrinsic to how xclim uses So this issue shows that we need to separate the dimensions management of The way I see it now, we should remove the stacking part in Secondly, we should internally rename the Including today, I have 4 (busy) days left before my vacations (which end on july 26th), and before the next planned xclim release (0.28). Xclim 0.29 is planned for the end of august. (In august, I will begin a project involving spatial analogs, which means a high probability of improvements to the So sorry for the possible delay, but a colleague might have time, I'll ping them! |
No problem at all about the delay - thanks for considering a reconfiguration of |
Description
I've been using
xclim.analog.spatial_analogs
to perform Kolmogorov Smirnov tests. I recently discovered I get aValueError: Names should be list-like for a MultiIndex
error if thedist_dim
is a MultiIndex. I've created a toy example below to demonstrate the problem.What I Did
What I Received
The text was updated successfully, but these errors were encountered: