Skip to content

Add SeasonGrouper, SeasonResampler #9524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
May 7, 2025
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
7e3a6a4
Add SeasonGrouper, SeasonResampler
dcherian Jun 28, 2024
879b496
Allow sliding seasons
dcherian Sep 20, 2024
8268c46
cftime support
dcherian Sep 22, 2024
31cc519
Add skeleton tests
dcherian Sep 22, 2024
96ae241
Support "subsampled" seasons
dcherian Sep 22, 2024
77dc5e0
small edits
dcherian Sep 22, 2024
d68b1e4
Add reset
dcherian Nov 12, 2024
1b7a9fc
Fix tests
dcherian Nov 14, 2024
be5f933
Raise if seasons are not sorted for resampling
dcherian Nov 14, 2024
bd21b48
fix Self import
dcherian Nov 14, 2024
09640b7
Redo calendar fixtures
dcherian Nov 14, 2024
8773faf
fix test
dcherian Nov 14, 2024
879af59
cftime tests
dcherian Nov 15, 2024
2ca67da
Fix doctest
dcherian Nov 16, 2024
f5191e5
typing
dcherian Nov 16, 2024
2512d53
fix test
dcherian Nov 16, 2024
f0f838c
Merge branch 'main' into custom-groupers
dcherian Nov 16, 2024
b9507fe
Merge branch 'main' into custom-groupers
dcherian Nov 16, 2024
b385532
Add tests for SeasonGrouper API (PR #9524) (#40)
tomvothecoder Nov 20, 2024
a21952a
try fixing test
dcherian Nov 21, 2024
9f3c270
Merge branch 'main' into custom-groupers
dcherian Jan 8, 2025
bc86751
lint
dcherian Jan 8, 2025
a62628b
Merge branch 'main' into custom-groupers
dcherian Mar 19, 2025
64c99c5
format
dcherian Mar 19, 2025
594f285
fix test
dcherian Mar 19, 2025
1313ab9
cleanup
dcherian Mar 19, 2025
32d9ed0
more cleanup
dcherian Mar 19, 2025
b068e94
fix
dcherian Mar 19, 2025
b9a34ca
Merge branch 'main' into custom-groupers
dcherian Mar 20, 2025
862cf2a
Fix automatic inference of unique_coord
dcherian Mar 20, 2025
f3f7d52
Squashed commit of the following:
dcherian Mar 20, 2025
85d9217
cleanup
dcherian Mar 20, 2025
de26f38
Fix
dcherian Mar 20, 2025
fc7297a
fix docstring
dcherian Mar 20, 2025
e3413f3
Merge remote-tracking branch 'upstream/main' into custom-groupers
dcherian Mar 25, 2025
861da6c
cleanup
dcherian Mar 26, 2025
7406458
Avoid silly sphinx complete rebuilds
dcherian Mar 26, 2025
6297c1c
Add docs
dcherian Mar 26, 2025
0d8210a
Merge branch 'main' into custom-groupers
dcherian Apr 2, 2025
6e6e55a
Merge branch 'main' into custom-groupers
dcherian Apr 8, 2025
db80db4
Merge branch 'main' into custom-groupers
dcherian Apr 29, 2025
6dbf8c9
Merge branch 'main' into custom-groupers
dcherian May 2, 2025
9786ed5
Merge branch 'main' into custom-groupers
dcherian May 6, 2025
3ee3fde
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
@@ -1329,6 +1329,8 @@ Grouper Objects
groupers.BinGrouper
groupers.UniqueGrouper
groupers.TimeResampler
groupers.SeasonGrouper
groupers.SeasonResampler


Rolling objects
2 changes: 2 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
@@ -182,6 +182,8 @@
"pd.NaT": "~pandas.NaT",
}

autodoc_type_aliases = napoleon_type_aliases # Keep both in sync

# mermaid config
mermaid_version = "10.9.1"

8 changes: 8 additions & 0 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
@@ -332,6 +332,14 @@ Different groupers can be combined to construct sophisticated GroupBy operations
ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
Time Grouping and Resampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. seealso::

See :ref:`resampling`.


Shuffling
~~~~~~~~~

132 changes: 101 additions & 31 deletions doc/user-guide/time-series.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. currentmodule:: xarray

.. _time-series:

================
@@ -21,26 +23,19 @@ core functionality.
Creating datetime64 data
------------------------

Xarray uses the numpy dtypes ``datetime64[unit]`` and ``timedelta64[unit]``
(where unit is one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
Xarray uses the numpy dtypes :py:class:`numpy.datetime64` and :py:class:`numpy.timedelta64`
with specified units (one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
data, which offer vectorized operations with numpy and smooth integration with pandas.

To convert to or create regular arrays of ``datetime64`` data, we recommend
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
To convert to or create regular arrays of :py:class:`numpy.datetime64` data, we recommend
using :py:func:`pandas.to_datetime`, :py:class:`pandas.DatetimeIndex`, or :py:func:`xarray.date_range`:

.. ipython:: python
pd.to_datetime(["2000-01-01", "2000-02-02"])
pd.DatetimeIndex(
["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
)
pd.date_range("2000-01-01", periods=365)
pd.date_range("2000-01-01", periods=365, unit="s")
It is also possible to use corresponding :py:func:`xarray.date_range`:

.. ipython:: python
xr.date_range("2000-01-01", periods=365)
xr.date_range("2000-01-01", periods=365, unit="s")
@@ -81,7 +76,7 @@ attribute like ``'days since 2000-01-01'``).


You can manual decode arrays in this form by passing a dataset to
:py:func:`~xarray.decode_cf`:
:py:func:`decode_cf`:

.. ipython:: python
@@ -93,8 +88,8 @@ You can manual decode arrays in this form by passing a dataset to
coder = xr.coders.CFDatetimeCoder(time_unit="s")
xr.decode_cf(ds, decode_times=coder)
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
:py:class:`~xarray.CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`CFTimeIndex` will be used for indexing.
:py:class:`CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
See :ref:`CFTimeIndex` for more information.

Datetime indexing
@@ -205,35 +200,37 @@ You can also search for multiple months (in this case January through March), us
Resampling and grouped operations
---------------------------------

Datetime components couple particularly well with grouped operations (see
:ref:`groupby`) for analyzing features that repeat over time. Here's how to
calculate the mean by time of day:

.. seealso::

For more generic documentation on grouping, see :ref:`groupby`.


Datetime components couple particularly well with grouped operations for analyzing features that repeat over time.
Here's how to calculate the mean by time of day:

.. ipython:: python
:okwarning:
ds.groupby("time.hour").mean()
For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
:py:meth:`Dataset.resample` method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as ``resample`` `in pandas`_.
same api as :py:meth:`pandas.DataFrame.resample` `in pandas`_.

.. _in pandas: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling

For example, we can downsample our dataset from hourly to 6-hourly:

.. ipython:: python
:okwarning:
ds.resample(time="6h")
This will create a specialized ``Resample`` object which saves information
necessary for resampling. All of the reduction methods which work with
``Resample`` objects can also be used for resampling:
This will create a specialized :py:class:`~xarray.core.resample.DatasetResample` or :py:class:`~xarray.core.resample.DataArrayResample`
object which saves information necessary for resampling. All of the reduction methods which work with
:py:class:`Dataset` or :py:class:`DataArray` objects can also be used for resampling:

.. ipython:: python
:okwarning:
ds.resample(time="6h").mean()
@@ -252,7 +249,7 @@ by specifying the ``dim`` keyword argument
ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])
For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
``nearest`` and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d``
``nearest`` and ``interpolate``. ``interpolate`` extends :py:func:`scipy.interpolate.interp1d`
and supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.

@@ -266,9 +263,7 @@ Data that has indices outside of the given ``tolerance`` are set to ``NaN``.
It is often desirable to center the time values after a resampling operation.
That can be accomplished by updating the resampled dataset time coordinate values
using time offset arithmetic via the `pandas.tseries.frequencies.to_offset`_ function.

.. _pandas.tseries.frequencies.to_offset: https://pandas.pydata.org/docs/reference/api/pandas.tseries.frequencies.to_offset.html
using time offset arithmetic via the :py:func:`pandas.tseries.frequencies.to_offset` function.

.. ipython:: python
@@ -277,5 +272,80 @@ using time offset arithmetic via the `pandas.tseries.frequencies.to_offset`_ fun
resampled_ds["time"] = resampled_ds.get_index("time") + offset
resampled_ds
For more examples of using grouped operations on a time dimension, see
:doc:`../examples/weather-data`.
.. seealso::

For more examples of using grouped operations on a time dimension, see :doc:`../examples/weather-data`.


Handling Seasons
~~~~~~~~~~~~~~~~

Two extremely common time series operations are to group by seasons, and resample to a seasonal frequency.
Xarray has historically supported some simple versions of these computations.
For example, ``.groupby("time.season")`` (where the seasons are DJF, MAM, JJA, SON)
and resampling to a seasonal frequency using Pandas syntax: ``.resample(time="QS-DEC")``.

Quite commonly one wants more flexibility in defining seasons. For these use-cases, Xarray provides
:py:class:`groupers.SeasonGrouper` and :py:class:`groupers.SeasonResampler`.


.. currentmodule:: xarray.groupers

.. ipython:: python
from xarray.groupers import SeasonGrouper
ds.groupby(time=SeasonGrouper(["DJF", "MAM", "JJA", "SON"])).mean()
Note how the seasons are in the specified order, unlike ``.groupby("time.season")`` where the
seasons are sorted alphabetically.

.. ipython:: python
ds.groupby("time.season").mean()
:py:class:`SeasonGrouper` supports overlapping seasons:

.. ipython:: python
ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()
Skipping months is allowed:

.. ipython:: python
ds.groupby(time=SeasonGrouper(["JJAS"])).mean()
Use :py:class:`SeasonResampler` to specify custom seasons.

.. ipython:: python
from xarray.groupers import SeasonResampler
ds.resample(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])).mean()
:py:class:`SeasonResampler` is smart enough to correctly handle years for seasons that
span the end of the year (e.g. DJF). By default :py:class:`SeasonResampler` will skip any
season that is incomplete (e.g. the first DJF season for a time series that starts in Jan).
Pass the ``drop_incomplete=False`` kwarg to :py:class:`SeasonResampler` to disable this behaviour.

.. ipython:: python
from xarray.groupers import SeasonResampler
ds.resample(
time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=False)
).mean()
Seasons need not be of the same length:

.. ipython:: python
ds.resample(time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).mean()
48 changes: 47 additions & 1 deletion properties/test_properties.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import itertools

import pytest

pytest.importorskip("hypothesis")

from hypothesis import given
import hypothesis.strategies as st
from hypothesis import given, note

import xarray as xr
import xarray.testing.strategies as xrst
from xarray.groupers import find_independent_seasons, season_to_month_tuple


@given(attrs=xrst.simple_attrs)
@@ -15,3 +19,45 @@ def test_assert_identical(attrs):

ds = xr.Dataset(attrs=attrs)
xr.testing.assert_identical(ds, ds.copy(deep=True))


@given(
roll=st.integers(min_value=0, max_value=12),
breaks=st.lists(
st.integers(min_value=0, max_value=11), min_size=1, max_size=12, unique=True
),
)
def test_property_season_month_tuple(roll, breaks):
chars = list("JFMAMJJASOND")
months = tuple(range(1, 13))

rolled_chars = chars[roll:] + chars[:roll]
rolled_months = months[roll:] + months[:roll]
breaks = sorted(breaks)
if breaks[0] != 0:
breaks = [0] + breaks
if breaks[-1] != 12:
breaks = breaks + [12]
seasons = tuple(
"".join(rolled_chars[start:stop]) for start, stop in itertools.pairwise(breaks)
)
actual = season_to_month_tuple(seasons)
expected = tuple(
rolled_months[start:stop] for start, stop in itertools.pairwise(breaks)
)
assert expected == actual


@given(data=st.data(), nmonths=st.integers(min_value=1, max_value=11))
def test_property_find_independent_seasons(data, nmonths):
chars = "JFMAMJJASOND"
# if stride > nmonths, then we can't infer season order
stride = data.draw(st.integers(min_value=1, max_value=nmonths))
chars = chars + chars[:nmonths]
seasons = [list(chars[i : i + nmonths]) for i in range(0, 12, stride)]
note(seasons)
groups = find_independent_seasons(seasons)
for group in groups:
inds = tuple(itertools.chain(*group.inds))
assert len(inds) == len(set(inds))
assert len(group.codes) == len(set(group.codes))
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -393,6 +393,8 @@ extend-ignore-identifiers-re = [
[tool.typos.default.extend-words]
# NumPy function names
arange = "arange"
ond = "ond"
aso = "aso"

# Technical terms
nd = "nd"
56 changes: 56 additions & 0 deletions xarray/compat/toolzcompat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# This file contains functions copied from the toolz library in accordance
# with its license. The original copyright notice is duplicated below.

# Copyright (c) 2013 Matthew Rocklin

# All rights reserved.

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:

# a. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# b. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# c. Neither the name of toolz nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.


# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
# DAMAGE.


def sliding_window(n, seq):
"""A sequence of overlapping subsequences
>>> list(sliding_window(2, [1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]
This function creates a sliding window suitable for transformations like
sliding means / smoothing
>>> mean = lambda seq: float(sum(seq)) / len(seq)
>>> list(map(mean, sliding_window(2, [1, 2, 3, 4])))
[1.5, 2.5, 3.5]
"""
import collections
import itertools

return zip(
*(
collections.deque(itertools.islice(it, i), 0) or it
for i, it in enumerate(itertools.tee(seq, n))
),
strict=False,
)
6 changes: 3 additions & 3 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
@@ -6860,7 +6860,7 @@ def groupby(
>>> da.groupby("letters")
<DataArrayGroupBy, grouped over 1 grouper(s), 2 groups in total:
'letters': 2/2 groups present with labels 'a', 'b'>
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>
Execute a reduction
@@ -6876,8 +6876,8 @@ def groupby(
>>> da.groupby(["letters", "x"])
<DataArrayGroupBy, grouped over 2 grouper(s), 8 groups in total:
'letters': 2/2 groups present with labels 'a', 'b'
'x': 4/4 groups present with labels 10, 20, 30, 40>
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'
'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>
Use Grouper objects to express more complicated GroupBy operations
6 changes: 3 additions & 3 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
@@ -9898,7 +9898,7 @@ def groupby(
>>> ds.groupby("letters")
<DatasetGroupBy, grouped over 1 grouper(s), 2 groups in total:
'letters': 2/2 groups present with labels 'a', 'b'>
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>
Execute a reduction
@@ -9915,8 +9915,8 @@ def groupby(
>>> ds.groupby(["letters", "x"])
<DatasetGroupBy, grouped over 2 grouper(s), 8 groups in total:
'letters': 2/2 groups present with labels 'a', 'b'
'x': 4/4 groups present with labels 10, 20, 30, 40>
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'
'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>
Use Grouper objects to express more complicated GroupBy operations
13 changes: 9 additions & 4 deletions xarray/core/groupby.py
Original file line number Diff line number Diff line change
@@ -263,6 +263,8 @@ def _ensure_1d(
from xarray.core.dataarray import DataArray

if isinstance(group, DataArray):
for dim in set(group.dims) - set(obj.dims):
obj = obj.expand_dims(dim)
# try to stack the dims of the group into a single dim
orig_dims = group.dims
stacked_dim = "stacked_" + "_".join(map(str, orig_dims))
@@ -843,7 +845,10 @@ def __repr__(self) -> str:
for grouper in self.groupers:
coord = grouper.unique_coord
labels = ", ".join(format_array_flat(coord, 30).split())
text += f"\n {grouper.name!r}: {coord.size}/{grouper.full_index.size} groups present with labels {labels}"
text += (
f"\n {grouper.name!r}: {type(grouper.grouper).__name__}({grouper.group.name!r}), "
f"{coord.size}/{grouper.full_index.size} groups with labels {labels}"
)
return text + ">"

def _iter_grouped(self) -> Iterator[T_Xarray]:
@@ -1081,7 +1086,7 @@ def _flox_reduce(
parsed_dim_list = list()
# preserve order
for dim_ in itertools.chain(
*(grouper.group.dims for grouper in self.groupers)
*(grouper.codes.dims for grouper in self.groupers)
):
if dim_ not in parsed_dim_list:
parsed_dim_list.append(dim_)
@@ -1095,7 +1100,7 @@ def _flox_reduce(
# Better to control it here than in flox.
for grouper in self.groupers:
if any(
d not in grouper.group.dims and d not in obj.dims for d in parsed_dim
d not in grouper.codes.dims and d not in obj.dims for d in parsed_dim
):
raise ValueError(f"cannot reduce over dimensions {dim}.")

@@ -1360,7 +1365,7 @@ def quantile(
self._obj.__class__.quantile,
shortcut=False,
q=q,
dim=dim,
dim=dim or self._group_dim,
method=method,
keep_attrs=keep_attrs,
skipna=skipna,
393 changes: 388 additions & 5 deletions xarray/groupers.py

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion xarray/tests/__init__.py
Original file line number Diff line number Diff line change
@@ -386,7 +386,10 @@ def create_test_data(
pytest.param(cal, marks=requires_cftime)
for cal in sorted(_NON_STANDARD_CALENDAR_NAMES)
]
_STANDARD_CALENDARS = [pytest.param(cal) for cal in _STANDARD_CALENDAR_NAMES]
_STANDARD_CALENDARS = [
pytest.param(cal, marks=requires_cftime if cal != "standard" else ())
for cal in _STANDARD_CALENDAR_NAMES
]
_ALL_CALENDARS = sorted(_STANDARD_CALENDARS + _NON_STANDARD_CALENDARS)
_CFTIME_CALENDARS = [
pytest.param(*p.values, marks=requires_cftime) for p in _ALL_CALENDARS
312 changes: 309 additions & 3 deletions xarray/tests/test_groupby.py
Original file line number Diff line number Diff line change
@@ -13,19 +13,23 @@
from packaging.version import Version

import xarray as xr
from xarray import DataArray, Dataset, Variable
from xarray import DataArray, Dataset, Variable, date_range
from xarray.core.groupby import _consolidate_slices
from xarray.core.types import InterpOptions, ResampleCompatible
from xarray.groupers import (
BinGrouper,
EncodedGroups,
Grouper,
SeasonGrouper,
SeasonResampler,
TimeResampler,
UniqueGrouper,
season_to_month_tuple,
)
from xarray.namedarray.pycompat import is_chunked_array
from xarray.structure.alignment import broadcast
from xarray.tests import (
_ALL_CALENDARS,
InaccessibleArray,
assert_allclose,
assert_equal,
@@ -615,7 +619,7 @@ def test_groupby_repr(obj, dim) -> None:
N = len(np.unique(obj[dim]))
expected = f"<{obj.__class__.__name__}GroupBy"
expected += f", grouped over 1 grouper(s), {N} groups in total:"
expected += f"\n {dim!r}: {N}/{N} groups present with labels "
expected += f"\n {dim!r}: UniqueGrouper({dim!r}), {N}/{N} groups with labels "
if dim == "x":
expected += "1, 2, 3, 4, 5>"
elif dim == "y":
@@ -632,7 +636,7 @@ def test_groupby_repr_datetime(obj) -> None:
actual = repr(obj.groupby("t.month"))
expected = f"<{obj.__class__.__name__}GroupBy"
expected += ", grouped over 1 grouper(s), 12 groups in total:\n"
expected += " 'month': 12/12 groups present with labels "
expected += " 'month': UniqueGrouper('month'), 12/12 groups with labels "
expected += "1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12>"
assert actual == expected

@@ -3287,6 +3291,308 @@ def test_groupby_dask_eager_load_warnings() -> None:
ds.groupby_bins("x", bins=[1, 2, 3], eagerly_compute_group=False)


class TestSeasonGrouperAndResampler:
def test_season_to_month_tuple(self):
assert season_to_month_tuple(["JF", "MAM", "JJAS", "OND"]) == (
(1, 2),
(3, 4, 5),
(6, 7, 8, 9),
(10, 11, 12),
)
assert season_to_month_tuple(["DJFM", "AM", "JJAS", "ON"]) == (
(12, 1, 2, 3),
(4, 5),
(6, 7, 8, 9),
(10, 11),
)

def test_season_grouper_raises_error_if_months_are_not_valid_or_not_continuous(
self,
):
calendar = "standard"
time = date_range("2001-01-01", "2002-12-30", freq="D", calendar=calendar)
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})

with pytest.raises(KeyError, match="IN"):
da.groupby(time=SeasonGrouper(["INVALID_SEASON"]))

with pytest.raises(KeyError, match="MD"):
da.groupby(time=SeasonGrouper(["MDF"]))

@pytest.mark.parametrize("calendar", _ALL_CALENDARS)
def test_season_grouper_with_months_spanning_calendar_year_using_same_year(
self, calendar
):
time = date_range("2001-01-01", "2002-12-30", freq="MS", calendar=calendar)
# fmt: off
data = np.array(
[
1.0, 1.25, 1.5, 1.75, 2.0, 1.1, 1.35, 1.6, 1.85, 1.2, 1.45, 1.7,
1.95, 1.05, 1.3, 1.55, 1.8, 1.15, 1.4, 1.65, 1.9, 1.25, 1.5, 1.75,
]

)
# fmt: on
da = DataArray(data, dims="time", coords={"time": time})
da["year"] = da.time.dt.year

actual = da.groupby(
year=UniqueGrouper(), time=SeasonGrouper(["NDJFM", "AMJ"])
).mean()

# Expected if the same year "ND" is used for seasonal grouping
expected = xr.DataArray(
data=np.array([[1.38, 1.616667], [1.51, 1.5]]),
dims=["year", "season"],
coords={"year": [2001, 2002], "season": ["NDJFM", "AMJ"]},
)

assert_allclose(expected, actual)

@pytest.mark.parametrize("calendar", _ALL_CALENDARS)
def test_season_grouper_with_partial_years(self, calendar):
time = date_range("2001-01-01", "2002-06-30", freq="MS", calendar=calendar)
# fmt: off
data = np.array(
[
1.0, 1.25, 1.5, 1.75, 2.0, 1.1, 1.35, 1.6, 1.85, 1.2, 1.45, 1.7,
1.95, 1.05, 1.3, 1.55, 1.8, 1.15,
]
)
# fmt: on
da = DataArray(data, dims="time", coords={"time": time})
da["year"] = da.time.dt.year

actual = da.groupby(
year=UniqueGrouper(), time=SeasonGrouper(["NDJFM", "AMJ"])
).mean()

# Expected if partial years are handled correctly
expected = xr.DataArray(
data=np.array([[1.38, 1.616667], [1.43333333, 1.5]]),
dims=["year", "season"],
coords={"year": [2001, 2002], "season": ["NDJFM", "AMJ"]},
)

assert_allclose(expected, actual)

@pytest.mark.parametrize("calendar", ["standard"])
def test_season_grouper_with_single_month_seasons(self, calendar):
time = date_range("2001-01-01", "2002-12-30", freq="MS", calendar=calendar)
# fmt: off
data = np.array(
[
1.0, 1.25, 1.5, 1.75, 2.0, 1.1, 1.35, 1.6, 1.85, 1.2, 1.45, 1.7,
1.95, 1.05, 1.3, 1.55, 1.8, 1.15, 1.4, 1.65, 1.9, 1.25, 1.5, 1.75,
]
)
# fmt: on
da = DataArray(data, dims="time", coords={"time": time})
da["year"] = da.time.dt.year

# TODO: Consider supporting this if needed
# It does not work without flox, because the group labels are not unique,
# and so the stack/unstack approach does not work.
with pytest.raises(ValueError):
da.groupby(
year=UniqueGrouper(),
time=SeasonGrouper(
["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"]
),
).mean()

# Expected if single month seasons are handled correctly
# expected = xr.DataArray(
# data=np.array(
# [
# [1.0, 1.25, 1.5, 1.75, 2.0, 1.1, 1.35, 1.6, 1.85, 1.2, 1.45, 1.7],
# [1.95, 1.05, 1.3, 1.55, 1.8, 1.15, 1.4, 1.65, 1.9, 1.25, 1.5, 1.75],
# ]
# ),
# dims=["year", "season"],
# coords={
# "year": [2001, 2002],
# "season": ["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"],
# },
# )
# assert_allclose(expected, actual)

@pytest.mark.parametrize("calendar", _ALL_CALENDARS)
def test_season_grouper_with_months_spanning_calendar_year_using_previous_year(
self, calendar
):
time = date_range("2001-01-01", "2002-12-30", freq="MS", calendar=calendar)
# fmt: off
data = np.array(
[
1.0, 1.25, 1.5, 1.75, 2.0, 1.1, 1.35, 1.6, 1.85, 1.2, 1.45, 1.7,
1.95, 1.05, 1.3, 1.55, 1.8, 1.15, 1.4, 1.65, 1.9, 1.25, 1.5, 1.75,
]
)
# fmt: on
da = DataArray(data, dims="time", coords={"time": time})

gb = da.resample(time=SeasonResampler(["NDJFM", "AMJ"], drop_incomplete=False))
actual = gb.mean()

# fmt: off
new_time_da = xr.DataArray(
dims="time",
data=pd.DatetimeIndex(
[
"2000-11-01", "2001-04-01", "2001-11-01", "2002-04-01", "2002-11-01"
]
),
)
# fmt: on
if calendar != "standard":
new_time_da = new_time_da.convert_calendar(
calendar=calendar, align_on="date"
)
new_time = new_time_da.time.variable

# Expected if the previous "ND" is used for seasonal grouping
expected = xr.DataArray(
data=np.array([1.25, 1.616667, 1.49, 1.5, 1.625]),
dims="time",
coords={"time": new_time},
)
assert_allclose(expected, actual)

@pytest.mark.parametrize("calendar", _ALL_CALENDARS)
def test_season_grouper_simple(self, calendar) -> None:
time = date_range("2001-01-01", "2002-12-30", freq="D", calendar=calendar)
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})
expected = da.groupby("time.season").mean()
# note season order matches expected
actual = da.groupby(
time=SeasonGrouper(
["DJF", "JJA", "MAM", "SON"], # drop_incomplete=False
)
).mean()
assert_identical(expected, actual)

@pytest.mark.parametrize("seasons", [["JJA", "MAM", "SON", "DJF"]])
def test_season_resampling_raises_unsorted_seasons(self, seasons):
calendar = "standard"
time = date_range("2001-01-01", "2002-12-30", freq="D", calendar=calendar)
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})
with pytest.raises(ValueError, match="sort"):
da.resample(time=SeasonResampler(seasons))

@pytest.mark.parametrize(
"use_cftime", [pytest.param(True, marks=requires_cftime), False]
)
@pytest.mark.parametrize("drop_incomplete", [True, False])
@pytest.mark.parametrize(
"seasons",
[
pytest.param(["DJF", "MAM", "JJA", "SON"], id="standard"),
pytest.param(["NDJ", "FMA", "MJJ", "ASO"], id="nov-first"),
pytest.param(["MAM", "JJA", "SON", "DJF"], id="standard-diff-order"),
pytest.param(["JFM", "AMJ", "JAS", "OND"], id="december-same-year"),
pytest.param(["DJF", "MAM", "JJA", "ON"], id="skip-september"),
pytest.param(["JJAS"], id="jjas-only"),
],
)
def test_season_resampler(
self, seasons: list[str], drop_incomplete: bool, use_cftime: bool
) -> None:
calendar = "standard"
time = date_range(
"2001-01-01",
"2002-12-30",
freq="D",
calendar=calendar,
use_cftime=use_cftime,
)
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})
counts = da.resample(time="ME").count()

seasons_as_ints = season_to_month_tuple(seasons)
month = counts.time.dt.month.data
year = counts.time.dt.year.data
for season, as_ints in zip(seasons, seasons_as_ints, strict=True):
if "DJ" in season:
for imonth in as_ints[season.index("D") + 1 :]:
year[month == imonth] -= 1
counts["time"] = (
"time",
[pd.Timestamp(f"{y}-{m}-01") for y, m in zip(year, month, strict=True)],
)
if has_cftime:
counts = counts.convert_calendar(calendar, "time", align_on="date")

expected_vals = []
expected_time = []
for year in [2001, 2002, 2003]:
for season, as_ints in zip(seasons, seasons_as_ints, strict=True):
out_year = year
if "DJ" in season:
out_year = year - 1
if out_year == 2003:
# this is a dummy year added to make sure we cover 2002-DJF
continue
available = [
counts.sel(time=f"{out_year}-{month:02d}").data for month in as_ints
]
if any(len(a) == 0 for a in available) and drop_incomplete:
continue
output_label = pd.Timestamp(f"{out_year}-{as_ints[0]:02d}-01")
expected_time.append(output_label)
# use concatenate to handle empty array when dec value does not exist
expected_vals.append(np.concatenate(available).sum())

expected = (
# we construct expected in the standard calendar
xr.DataArray(expected_vals, dims="time", coords={"time": expected_time})
)
if has_cftime:
# and then convert to the expected calendar,
expected = expected.convert_calendar(
calendar, align_on="date", use_cftime=use_cftime
)
# and finally sort since DJF will be out-of-order
expected = expected.sortby("time")

rs = SeasonResampler(seasons, drop_incomplete=drop_incomplete)
# through resample
actual = da.resample(time=rs).sum()
assert_identical(actual, expected)

@requires_cftime
def test_season_resampler_errors(self):
time = date_range("2001-01-01", "2002-12-30", freq="D", calendar="360_day")
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})

# non-datetime array
with pytest.raises(ValueError):
DataArray(np.ones(5), dims="time").groupby(time=SeasonResampler(["DJF"]))

# ndim > 1 array
with pytest.raises(ValueError):
DataArray(
np.ones((5, 5)), dims=("t", "x"), coords={"x": np.arange(5)}
).groupby(x=SeasonResampler(["DJF"]))

# overlapping seasons
with pytest.raises(ValueError):
da.groupby(time=SeasonResampler(["DJFM", "MAMJ", "JJAS", "SOND"])).sum()

@requires_cftime
def test_season_resampler_groupby_identical(self):
time = date_range("2001-01-01", "2002-12-30", freq="D")
da = DataArray(np.ones(time.size), dims="time", coords={"time": time})

# through resample
resampler = SeasonResampler(["DJF", "MAM", "JJA", "SON"])
rs = da.resample(time=resampler).sum()

# through groupby
gb = da.groupby(time=resampler).sum()
assert_identical(rs, gb)


# TODO: Possible property tests to add to this module
# 1. lambda x: x
# 2. grouped-reduce on unique coords is identical to array

Unchanged files with check annotations Beta

with xr.open_dataset(url, engine=self.engine, group="/") as expected:
assert_identical(unaligned_dict_of_datasets["/"], expected)
with xr.open_dataset(url, group="Group1", engine=self.engine) as expected:
assert_identical(unaligned_dict_of_datasets["/Group1"], expected)

Check failure on line 464 in xarray/tests/test_backends_datatree.py

GitHub Actions / ubuntu-latest py3.13 flaky

TestPyDAPDatatreeIO.test_open_groups AssertionError: Left and right Dataset objects are not identical Differing data variables: L group_1_var (lon, lat) float64 16B ... R group_1_var (lat, lon) float64 16B ...
with xr.open_dataset(
url,
group="/Group1/subgroup1",
if xp == np:
# numpy currently doesn't have a astype:
return data.astype(dtype, **kwargs)

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / macos-latest py3.10

invalid value encountered in cast

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / macos-latest py3.10

invalid value encountered in cast

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / ubuntu-latest py3.10

invalid value encountered in cast

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / ubuntu-latest py3.10

invalid value encountered in cast

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / windows-latest py3.10

invalid value encountered in cast

Check warning on line 237 in xarray/core/duck_array_ops.py

GitHub Actions / windows-latest py3.10

invalid value encountered in cast
return xp.astype(data, dtype, **kwargs)
# otherwise numpy unsigned ints will silently cast to the signed counterpart
fill_value = fill_value.item()
# passes if provided fill value fits in encoded on-disk type
new_fill = encoded_dtype.type(fill_value)

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).

Check warning on line 237 in xarray/coding/variables.py

GitHub Actions / ubuntu-latest py3.10 min-all-deps

NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 255 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows).
except OverflowError:
encoded_kind_str = "signed" if encoded_dtype.kind == "i" else "unsigned"
warnings.warn(