Skip to content
forked from pydata/xarray

Commit

Permalink
Enable flox in GroupBy and resample (pydata#5734)
Browse files Browse the repository at this point in the history
Closes pydata#5734
Closes pydata#4473
Closes pydata#4498
Closes pydata#659
Closes pydata#2237
xref pangeo-data/pangeo#271

Co-authored-by: Anderson Banihirwe <axbanihirwe@ualr.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
  • Loading branch information
6 people committed May 13, 2022
1 parent c34ef8a commit 7043bdb
Show file tree
Hide file tree
Showing 19 changed files with 1,185 additions and 355 deletions.
2 changes: 2 additions & 0 deletions asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@
"bottleneck": [""],
"dask": [""],
"distributed": [""],
"flox": [""],
"numpy_groupies": [""],
"sparse": [""]
},

Expand Down
10 changes: 6 additions & 4 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ def setup(self, *args, **kwargs):
{
"a": xr.DataArray(np.r_[np.repeat(1, self.n), np.repeat(2, self.n)]),
"b": xr.DataArray(np.arange(2 * self.n)),
"c": xr.DataArray(np.arange(2 * self.n)),
}
)
self.ds2d = self.ds1d.expand_dims(z=10)
Expand Down Expand Up @@ -50,10 +51,11 @@ class GroupByDask(GroupBy):
def setup(self, *args, **kwargs):
requires_dask()
super().setup(**kwargs)
self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2)).chunk({"dim_0": 50})
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2)).chunk(
{"dim_0": 50, "z": 5}
)

self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2))
self.ds1d["c"] = self.ds1d["c"].chunk({"dim_0": 50})
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2))
self.ds2d["c"] = self.ds2d["c"].chunk({"dim_0": 50, "z": 5})
self.ds1d_mean = self.ds1d.groupby("b").mean()
self.ds2d_mean = self.ds2d.groupby("b").mean()

Expand Down
2 changes: 2 additions & 0 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ conda uninstall -y --force \
pint \
bottleneck \
sparse \
flox \
h5netcdf \
xarray
# to limit the runtime of Upstream CI
Expand Down Expand Up @@ -47,4 +48,5 @@ python -m pip install \
git+https://github.com/pydata/sparse \
git+https://github.com/intake/filesystem_spec \
git+https://github.com/SciTools/nc-time-axis \
git+https://github.com/dcherian/flox \
git+https://github.com/h5netcdf/h5netcdf
1 change: 1 addition & 0 deletions ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies:
- cfgrib
- cftime
- coveralls
- flox
- h5netcdf
- h5py
- hdf5
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ dependencies:
- cftime
- dask-core
- distributed
- flox
- fsspec!=2021.7.0
- h5netcdf
- h5py
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- cftime
- dask-core
- distributed
- flox
- fsspec!=2021.7.0
- h5netcdf
- h5py
Expand Down
1 change: 1 addition & 0 deletions ci/requirements/min-all-deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- coveralls
- dask-core=2021.04
- distributed=2021.04
- flox=0.5
- h5netcdf=0.11
- h5py=3.1
# hdf5 1.12 conflicts with h5py=3.1
Expand Down
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,11 @@ Performance
- GroupBy binary operations are now vectorized.
Previously this involved looping over all groups. (:issue:`5804`,:pull:`6160`)
By `Deepak Cherian <https://github.com/dcherian>`_.
- Substantially improved GroupBy operations using `flox <https://flox.readthedocs.io/en/latest/>`_.
This is auto-enabled when ``flox`` is installed. Use ``xr.set_options(use_flox=False)`` to use
the old algorithm. (:issue:`4473`, :issue:`4498`, :issue:`659`, :issue:`2237`, :pull:`271`).
By `Deepak Cherian <https://github.com/dcherian>`_,`Anderson Banihirwe <https://github.com/andersy005>`_,
`Jimmy Westling <https://github.com/illviljan>`_.

Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ accel =
scipy
bottleneck
numbagg
flox

parallel =
dask[complete]
Expand Down
Loading

0 comments on commit 7043bdb

Please sign in to comment.