You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm frequently trying to apply multiple reductions to the same groups (e.g. mean, count, min, max), and I wonder if there's any way we can get the API to support that use case? I don't understand the algorithm enough to be able to tell if that would be possible to implement in the algorithm, or if we'd basically add a wrapper that would do a new groupby per reduction.
I guess what I'm hoping for is for GroupBy objects to have something similar to pandas' .agg method.
Edit: not sure if that would be better to reraise on the xarray issue tracker?
The text was updated successfully, but these errors were encountered:
It would be a decent bit of complexity to add, and I'm not inclined to add it.
There would be two advantages:
The data are only factorized once, and the integer codes are reused.
We could drastically reduce the number of tasks in the dask graph at the cost of more complicated code. Number of tasks is reduced because we can maker a single task calculate all the necessary intermediates for all reductions.
I'm not sure (1) is worth it, at least for xarray, because after pydata/xarray#7206, we will get this for free by just calling each individual method on a saved GroupBy object (for xarray).
I'm not sure (2) is worth it for a couple of cases:
It will also mean that to calculate maxonly you will calculate every other reduction and then discard it.
If you're writing the output to zarr for example, you lose parallelism again.
It could be an advantage to only compute count once and reuse it for count, mean but not sure its worth it. We could get this advantage by instead breaking up the current algo to. compute count and sum separately for mean. Then the dask optimizer will handle the shared count computation for us.
I'm frequently trying to apply multiple reductions to the same groups (e.g.
mean
,count
,min
,max
), and I wonder if there's any way we can get the API to support that use case? I don't understand the algorithm enough to be able to tell if that would be possible to implement in the algorithm, or if we'd basically add a wrapper that would do a new groupby per reduction.I guess what I'm hoping for is for
GroupBy
objects to have something similar topandas
'.agg
method.Edit: not sure if that would be better to reraise on the
xarray
issue tracker?The text was updated successfully, but these errors were encountered: