Skip to content

Commit

Permalink
Thresholded events - Quantified rate2amount (#1778)
Browse files Browse the repository at this point in the history
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] Tests for the changes have been added (for bug fixes / features)
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] CHANGES.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?

* New `thresholded_events` generic compute and `find_events` in
`run_length`. The idea is to find all runs defined by a start and a stop
condition and return them along an "event" dimension.
* It relies on a new function in `run_length` that can compute the
run_length of sequence determined by two conditions. A first condition
determines when runs should start, and a second one determines when they
stop. This is called `runs_with_holes` as one possibility with the
addtition of the second condition is this: consider a normal run_length,
but the second condition allows for holes in thoses sequences, where
those holes cannot exceed a given threshold `window_stop`.
`runs_with_holes` is in fact more general, so perhaps another name
should be found.
* `rate2amount` and `amount2rate` can now accept Quantified (str and
Quantities), which then requires passing the time coordinate in as the
`dim` argument.


### Does this PR introduce a breaking change?
No 

### Other information:
For the time being, I removed the references to "freezing rain" as I'm
not sure I will need the function in xclim. If I do, then, I'll add it
back.
  • Loading branch information
aulemahal authored Oct 10, 2024
2 parents c55ee82 + 0f03cdd commit 96484df
Show file tree
Hide file tree
Showing 9 changed files with 453 additions and 46 deletions.
7 changes: 5 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Announcements

New indicators
^^^^^^^^^^^^^^
* New ``heat_spell_frequency``, ``heat_spell_max_length`` and ``heat_spell_total_length`` : spell length statistics on a bivariate condition that uses the average over a window by default. (:pull:`1885`).
* New ``hot_spell_max_magnitude``: yields the magnitude of the most intensive heat wave. (:pull:`1926`).
* New ``heat_spell_frequency``, ``heat_spell_max_length`` and ``heat_spell_total_length`` : spell length statistics on a bivariate condition that uses the average over a window by default. (:pull:`1885`, :pull:`1778`).
* New ``hot_spell_max_magnitude`` : yields the magnitude of the most intensive heat wave. (:pull:`1926`).
* New ``chill_portion`` and ``chill_unit``: chill portion based on the Dynamic Model and chill unit based on the Utah model indicators. (:issue:`1753`, :pull:`1909`).
* New ``water_cycle_intensity``: yields the sum of precipitation and actual evapotranspiration. (:issue:`410`, :pull:`1947`).

Expand All @@ -25,6 +25,9 @@ New features and enhancements
* ``xclim.indices.run_length.windowed_max_run_sum`` accumulates positive values across runs and yields the the maximum valued run. (:pull:`1926`).
* Helper function ``xclim.indices.helpers.make_hourly_temperature`` to estimate hourly temperatures from daily min and max temperatures. (:pull:`1909`).
* New global option ``resample_map_blocks`` to wrap all ``resample().map()`` code inside a ``xr.map_blocks`` to lower the number of dask tasks. Uses utility ``xclim.indices.helpers.resample_map`` and requires ``flox`` to ensure the chunking allows such block-mapping. Defaults to False. (:pull:`1848`).
* ``xclim.indices.run_length.runs_with_holes`` allows to input a condition that must be met for a run to start and a second condition that must be met for the run to stop. (:pull:`1778`).
* New generic compute function ``xclim.indices.generic.thresholded_events`` that finds events based on a threshold condition and returns basic stats for each. See also ``xclim.indices.run_length.find_events``. (:pull:`1778`).
* ``xclim.core.units.rate2amount`` and ``xclim.core.units.amount2rate`` can now also accept quantities (pint objects or strings), in which case the ``dim`` argument must be the ``time`` coordinate through which we can find the sampling rate. (:pull:`1778`).

Bug fixes
^^^^^^^^^
Expand Down
124 changes: 124 additions & 0 deletions tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

from xclim.core.calendar import doy_to_days_since, select_time
from xclim.indices import generic
from xclim.testing.helpers import assert_lazy

K2C = 273.15

Expand Down Expand Up @@ -768,3 +769,126 @@ def test_spell_length_statistics_multi(tasmin_series, tasmax_series):
)
xr.testing.assert_equal(outs, outm)
np.testing.assert_allclose(outc, 1)


class TestThresholdedEvents:

@pytest.mark.parametrize("use_dask", [True, False])
def test_simple(self, pr_series, use_dask):
arr = np.array([0, 0, 0, 1, 2, 3, 0, 3, 3, 10, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 1, 3, 3, 2, 0, 0, 0, 2, 0, 0, 0, 0]) # fmt: skip
pr = pr_series(arr, start="2000-01-01", units="mm")
if use_dask:
pr = pr.chunk(-1)

with assert_lazy:
out = generic.thresholded_events(
pr,
thresh="1 mm",
op=">=",
window=3,
)

assert out.event.size == np.ceil(arr.size / (3 + 1))
out = out.load().dropna("event", how="all")

np.testing.assert_array_equal(out.event_length, [3, 3, 4, 4])
np.testing.assert_array_equal(out.event_effective_length, [3, 3, 4, 4])
np.testing.assert_array_equal(out.event_sum, [6, 16, 7, 9])
np.testing.assert_array_equal(
out.event_start,
np.array(
["2000-01-04", "2000-01-08", "2000-01-16", "2000-01-26"],
dtype="datetime64[ns]",
),
)

@pytest.mark.parametrize("use_dask", [True, False])
def test_diff_windows(self, pr_series, use_dask):
arr = np.array([0, 0, 0, 1, 2, 3, 0, 3, 3, 10, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 1, 3, 3, 2, 0, 0, 0, 2, 0, 0, 0, 0]) # fmt: skip
pr = pr_series(arr, start="2000-01-01", units="mm")
if use_dask:
pr = pr.chunk(-1)

# different window stop
out = (
generic.thresholded_events(
pr, thresh="2 mm", op=">=", window=3, window_stop=4
)
.load()
.dropna("event", how="all")
)

np.testing.assert_array_equal(out.event_length, [3, 3, 7])
np.testing.assert_array_equal(out.event_effective_length, [3, 3, 4])
np.testing.assert_array_equal(out.event_sum, [16, 6, 10])
np.testing.assert_array_equal(
out.event_start,
np.array(
["2000-01-08", "2000-01-17", "2000-01-27"], dtype="datetime64[ns]"
),
)

@pytest.mark.parametrize("use_dask", [True, False])
def test_cftime(self, pr_series, use_dask):
arr = np.array([0, 0, 0, 1, 2, 3, 0, 3, 3, 10, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 1, 3, 3, 2, 0, 0, 0, 2, 0, 0, 0, 0]) # fmt: skip
pr = pr_series(arr, start="2000-01-01", units="mm").convert_calendar("noleap")
if use_dask:
pr = pr.chunk(-1)

# cftime
with assert_lazy:
out = generic.thresholded_events(
pr,
thresh="1 mm",
op=">=",
window=3,
window_stop=3,
)
out = out.load().dropna("event", how="all")

np.testing.assert_array_equal(out.event_length, [7, 4, 4])
np.testing.assert_array_equal(out.event_effective_length, [6, 4, 4])
np.testing.assert_array_equal(out.event_sum, [22, 7, 9])
exp = xr.DataArray(
[1, 2, 3],
dims=("time",),
coords={
"time": np.array(
["2000-01-04", "2000-01-16", "2000-01-26"], dtype="datetime64[ns]"
)
},
)
np.testing.assert_array_equal(
out.event_start, exp.convert_calendar("noleap").time
)

@pytest.mark.parametrize("use_dask", [True, False])
def test_freq(self, pr_series, use_dask):
jan = [0, 0, 0, 1, 2, 3, 0, 3, 3, 10, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 3, 2, 3, 2] # fmt: skip
fev = [2, 2, 1, 0, 0, 0, 3, 3, 4, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # fmt: skip
pr = pr_series(np.array(jan + fev), start="2000-01-01", units="mm")
if use_dask:
pr = pr.chunk(-1)

with assert_lazy:
out = generic.thresholded_events(
pr, thresh="1 mm", op=">=", window=3, freq="MS", window_stop=3
)
assert out.event_length.shape == (2, 6)
out = out.load().dropna("event", how="all")

np.testing.assert_array_equal(out.event_length, [[7, 6, 4], [3, 5, np.nan]])
np.testing.assert_array_equal(
out.event_effective_length, [[6, 6, 4], [3, 5, np.nan]]
)
np.testing.assert_array_equal(out.event_sum, [[22, 12, 10], [5, 17, np.nan]])
np.testing.assert_array_equal(
out.event_start,
np.array(
[
["2000-01-04", "2000-01-17", "2000-01-28"],
["2000-02-01", "2000-02-07", "NaT"],
],
dtype="datetime64[ns]",
),
)
11 changes: 5 additions & 6 deletions tests/test_run_length.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,8 @@ def test_rle(ufunc, use_dask, index):

@pytest.mark.parametrize("use_dask", [True, False])
@pytest.mark.parametrize("index", ["first", "last"])
def test_extract_events_identity(use_dask, index):
# implement more tests, this is just to show that this reproduces the behaviour
# of rle
def test_runs_with_holes_identity(use_dask, index):
# This test reproduces the behaviour or `rle`
values = np.zeros((10, 365, 4, 4))
time = pd.date_range("2000-01-01", periods=365, freq="D")
values[:, 1:11, ...] = 1
Expand All @@ -137,19 +136,19 @@ def test_extract_events_identity(use_dask, index):
if use_dask:
da = da.chunk({"a": 1, "b": 2})

events = rl.extract_events(da != 0, 1, da == 0, 1)
events = rl.runs_with_holes(da != 0, 1, da == 0, 1)
expected = da
np.testing.assert_array_equal(events, expected)


def test_extract_events():
def test_runs_with_holes():
values = np.zeros(365)
time = pd.date_range("2000-01-01", periods=365, freq="D")
a = [0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
values[0 : len(a)] = a
da = xr.DataArray(values, coords={"time": time}, dims=("time"))

events = rl.extract_events(da == 1, 1, da == 0, 3)
events = rl.runs_with_holes(da == 1, 1, da == 0, 3)

expected = values * 0
expected[1:11] = 1
Expand Down
14 changes: 14 additions & 0 deletions tests/test_temperature.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,20 @@ def test_1d(self, tasmax_series, tasmin_series):
)
np.testing.assert_allclose(hsf.values[:1], 0)

def test_gap(self, tasmax_series, tasmin_series):
tn1 = np.zeros(366)
tx1 = np.zeros(366)
tn1[:10] = np.array([20, 23, 23, 23, 20, 20, 23, 23, 23, 23])
tx1[:10] = np.array([29, 31, 31, 31, 28, 28, 31, 31, 31, 31])

tn = tasmin_series(tn1 + K2C, start="1/1/2000")
tx = tasmax_series(tx1 + K2C, start="1/1/2000")

hsf = atmos.heat_spell_frequency(
tn, tx, thresh_tasmin="22.1 C", thresh_tasmax="30.1 C", freq="YS", min_gap=3
)
np.testing.assert_allclose(hsf.values[:1], 1)


class TestHeatSpellMaxLength:
def test_1d(self, tasmax_series, tasmin_series):
Expand Down
66 changes: 45 additions & 21 deletions xclim/core/units.py
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,7 @@ def cf_conversion(
FREQ_UNITS = {
"D": "d",
"W": "week",
"h": "h",
}
"""
Resampling frequency units for :py:func:`xclim.core.units.infer_sampling_units`.
Expand Down Expand Up @@ -622,8 +623,8 @@ def to_agg_units(


def _rate_and_amount_converter(
da: xr.DataArray,
dim: str = "time",
da: Quantified,
dim: str | xr.DataArray = "time",
to: str = "amount",
sampling_rate_from_coord: bool = False,
out_units: str | None = None,
Expand All @@ -632,10 +633,27 @@ def _rate_and_amount_converter(
m = 1
u = None # Default to assume a non-uniform axis
label: Literal["lower", "upper"] = "lower" # Default to "lower" label for diff
time = da[dim]
if isinstance(dim, str):
if not isinstance(da, xr.DataArray):
raise ValueError(
"If `dim` is a string, the data to convert must be a DataArray."
)
time = da[dim]
else:
time = dim
dim = time.name

# We accept str, Quantity or DataArray
# Ensure the code below has a DataArray, so its simpler
# We convert back at the end
orig_da = da
if isinstance(da, str):
da = str2pint(da)
if isinstance(da, units.Quantity):
da = xr.DataArray(da.magnitude, attrs={"units": f"{da.units:~cf}"})

try:
freq = xr.infer_freq(da[dim])
freq = xr.infer_freq(time)
except ValueError as err:
if sampling_rate_from_coord:
freq = None
Expand Down Expand Up @@ -669,7 +687,7 @@ def _rate_and_amount_converter(
),
dims=(dim,),
name=dim,
attrs=da[dim].attrs,
attrs=time.attrs,
)
else:
m, u = multi, FREQ_UNITS[base]
Expand All @@ -683,7 +701,7 @@ def _rate_and_amount_converter(
# and `label` has been updated accordingly.
dt = (
time.diff(dim, label=label)
.reindex({dim: da[dim]}, method="ffill")
.reindex({dim: time}, method="ffill")
.astype(float)
)
dt = dt / 1e9 # Convert to seconds
Expand Down Expand Up @@ -716,15 +734,17 @@ def _rate_and_amount_converter(
out = out.assign_attrs(standard_name=new_name)

if out_units:
out = cast(xr.DataArray, convert_units_to(out, out_units))
out = convert_units_to(out, out_units)

if not isinstance(orig_da, xr.DataArray):
out = units.Quantity(out.item(), out.attrs["units"])
return out


@_register_conversion("amount2rate", "from")
def rate2amount(
rate: xr.DataArray,
dim: str = "time",
rate: Quantified,
dim: str | xr.DataArray = "time",
sampling_rate_from_coord: bool = False,
out_units: str | None = None,
) -> xr.DataArray:
Expand All @@ -738,10 +758,10 @@ def rate2amount(
Parameters
----------
rate : xr.DataArray
rate : xr.DataArray, pint.Quantity or string
"Rate" variable, with units of "amount" per time. Ex: Precipitation in "mm / d".
dim : str
The time dimension.
dim : str or DataArray
The name of time dimension or the coordinate itself.
sampling_rate_from_coord : boolean
For data with irregular time coordinates. If True, the diff of the time coordinate will be used as the sampling rate,
meaning each data point will be assumed to apply for the interval ending at the next point. See notes.
Expand All @@ -756,7 +776,7 @@ def rate2amount(
Returns
-------
xr.DataArray
xr.DataArray or Quantity
Examples
--------
Expand Down Expand Up @@ -804,8 +824,8 @@ def rate2amount(

@_register_conversion("amount2rate", "to")
def amount2rate(
amount: xr.DataArray,
dim: str = "time",
amount: Quantified,
dim: str | xr.DataArray = "time",
sampling_rate_from_coord: bool = False,
out_units: str | None = None,
) -> xr.DataArray:
Expand All @@ -819,10 +839,10 @@ def amount2rate(
Parameters
----------
amount : xr.DataArray
amount : xr.DataArray, pint.Quantity or string
"amount" variable. Ex: Precipitation amount in "mm".
dim : str
The time dimension.
dim : str or xr.DataArray
The name of the time dimension or the time coordinate itself.
sampling_rate_from_coord : boolean
For data with irregular time coordinates.
If True, the diff of the time coordinate will be used as the sampling rate,
Expand All @@ -839,7 +859,7 @@ def amount2rate(
Returns
-------
xr.DataArray
xr.DataArray or Quantity
See Also
--------
Expand Down Expand Up @@ -1157,12 +1177,16 @@ def check_units(
)


def _check_output_has_units(out: xr.DataArray | tuple[xr.DataArray]) -> None:
def _check_output_has_units(
out: xr.DataArray | tuple[xr.DataArray] | xr.Dataset,
) -> None:
"""Perform very basic sanity check on the output.
Indices are responsible for unit management. If this fails, it's a developer's error.
"""
if not isinstance(out, tuple):
if isinstance(out, xr.Dataset):
out = out.data_vars.values()
elif not isinstance(out, tuple):
out = (out,)

for outd in out:
Expand Down
Loading

0 comments on commit 96484df

Please sign in to comment.