Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted quantile #6059

Merged
merged 52 commits into from
Mar 27, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
dcd1b24
Add weighted quantile
cjauvin Dec 10, 2021
875f766
Add weighted quantile to documentation
cjauvin Dec 10, 2021
217d2f0
Apply suggestions from code review
cjauvin Dec 10, 2021
278bed9
Apply suggestions from code review
cjauvin Dec 10, 2021
80ae229
Improve _weighted_quantile_type7_1d ufunc with suggestions
cjauvin Dec 11, 2021
77bb84e
Merge remote-tracking branch 'origin/main' into weighted-quantile
cjauvin Dec 11, 2021
58af567
Expand scope of invalid q value test
cjauvin Dec 11, 2021
15e3834
Fix weighted quantile with zero weights
cjauvin Dec 13, 2021
83e4210
Replace np.ones by xr.ones_like in weighted quantile test
cjauvin Dec 13, 2021
2237399
Process weighted quantile data with all nans
cjauvin Dec 13, 2021
b936e21
Fix operator precedence bug
cjauvin Dec 13, 2021
c94fa16
Merge branch 'main' into pr/6059
Illviljan Dec 29, 2021
ab810d7
Used effective sample size. Generalize to different quantile types su…
huard Jan 11, 2022
7bcf09e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2022
8427637
Apply suggestions from code review
huard Jan 12, 2022
3217962
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2022
7379d22
added missing Typing hints
huard Jan 12, 2022
c8871d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2022
b26a5fc
update what's new and pep8 fixes
huard Jan 12, 2022
42ebcc2
merge
huard Jan 12, 2022
5aa22a4
Merge branch 'main' into weighted-quantile
huard Jan 12, 2022
abe253e
add docstring paragraph discussing weight interpretation
huard Jan 19, 2022
82147aa
recognize numpy names for quantile interpolation methods
huard Jan 19, 2022
784cedd
tweak to avoid warning with all nans data. simplify test
huard Jan 19, 2022
3ee62fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 19, 2022
4d6a4fd
remove integers from quantile interpolation available methods
huard Jan 19, 2022
9f93f55
merge
huard Jan 19, 2022
8132320
Merge remote-tracking branch 'upstream/main' into weighted-quantile
huard Jan 19, 2022
4d7f5f5
remove merge artifacts
Illviljan Jan 19, 2022
c268ddd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 19, 2022
db706aa
[skip-ci] fix bad merge in whats-new
dcherian Jan 20, 2022
33ee96c
Add references
huard Jan 20, 2022
2ffd3d3
renamed htype argument to method in private functions
huard Feb 8, 2022
9806db8
Merge branch 'main' into weighted-quantile
huard Feb 8, 2022
15ee999
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 8, 2022
9559c87
Update xarray/core/weighted.py
huard Feb 9, 2022
73dde79
Add skipped test to verify equal weights quantile with methods
cjauvin Feb 17, 2022
59714af
Apply suggestions from code review
huard Feb 17, 2022
585b705
Update xarray/core/weighted.py
huard Feb 17, 2022
7443b82
modifications suggested by review: comments, remove align, clarify te…
huard Feb 17, 2022
42a6a49
adjust typing. resolve conflicts
huard Feb 17, 2022
2e0c16e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 17, 2022
4186a24
Apply suggestions from code review
huard Feb 21, 2022
1be1f92
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2022
5c251e0
use broadcast
mathause Feb 25, 2022
9060f8e
Merge branch 'main' into weighted-quantile
mathause Mar 4, 2022
c112d2c
move whatsnew entry
mathause Mar 4, 2022
d4ba8ee
Apply suggestions from code review
mathause Mar 4, 2022
fd0a54e
switch skipna determination
mathause Mar 4, 2022
8bd83f9
Merge branch 'weighted-quantile' of https://github.com/cjauvin/xarray…
mathause Mar 4, 2022
343b47e
use align and broadcast
mathause Mar 4, 2022
c298bd0
Merge branch 'main' into pr/6059
Illviljan Mar 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,7 @@ Dataset

DatasetWeighted
DatasetWeighted.mean
DatasetWeighted.quantile
DatasetWeighted.sum
DatasetWeighted.std
DatasetWeighted.var
Expand All @@ -951,6 +952,7 @@ DataArray

DataArrayWeighted
DataArrayWeighted.mean
DataArrayWeighted.quantile
DataArrayWeighted.sum
DataArrayWeighted.std
DataArrayWeighted.var
Expand Down
8 changes: 7 additions & 1 deletion doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ Weighted array reductions

:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` array reduction methods. They currently
support weighted ``sum``, ``mean``, ``std`` and ``var``.
support weighted ``sum``, ``mean``, ``std``, ``var`` and ``quantile``.

.. ipython:: python

Expand Down Expand Up @@ -291,6 +291,12 @@ Calculate the weighted mean:

weighted_prec.mean(dim="month")

Calculate the weighted quantile:

.. ipython:: python

weighted_prec.quantile(q=0.5, dim="month")

The weighted sum corresponds to:

.. ipython:: python
Expand Down
3 changes: 2 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ v0.20.2 (unreleased)

New Features
~~~~~~~~~~~~

- Add ``quantile`` to :py:class:`~core.weighted.DatasetWeighted` and :py:class:`~core.weighted.DataArrayWeighted`.
By `Christian Jauvin <https://github.com/cjauvin>`_.

dcherian marked this conversation as resolved.
Show resolved Hide resolved
Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
146 changes: 143 additions & 3 deletions xarray/core/weighted.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
from typing import TYPE_CHECKING, Generic, Hashable, Iterable, Optional, Union, cast
from typing import (
TYPE_CHECKING,
Generic,
Hashable,
Iterable,
Optional,
Sequence,
Union,
cast,
)

import numpy as np

from . import duck_array_ops
from .computation import dot
from . import duck_array_ops, utils
from .computation import apply_ufunc, dot
from .pycompat import is_duck_dask_array
from .types import T_Xarray

Expand Down Expand Up @@ -54,6 +63,48 @@
New {cls} object with the sum of the weights over the given dimension.
"""

_WEIGHTED_QUANTILE_DOCSTRING_TEMPLATE = """
Apply a a weighted ``quantile`` to this {cls}'s data along some dimension(s).
dcherian marked this conversation as resolved.
Show resolved Hide resolved
huard marked this conversation as resolved.
Show resolved Hide resolved

For compatibility with NumPy's non-weighted ``quantile`` (which is used by
``DataArray.quantile`` and ``Dataset.quantile``), the only interpolation
method supported by this weighted version corresponds to the default "linear"
option of ``numpy.quantile``. This is "Type 7" option, described in Hyndman
and Fan (1996): https://doi.org/10.2307/2684934.

Parameters
----------
q : float or sequence of float
Quantile to compute, which must be between 0 and 1 inclusive.
dim : str or sequence of str, optional
Dimension(s) over which to apply the weighted ``quantile``.
skipna : bool, optional
If True, skip missing values (as marked by NaN). By default, only
skips missing values for float dtypes; other dtypes either do not
have a sentinel missing value (int) or skipna=True has not been
implemented (object, datetime64 or timedelta64).
keep_attrs : bool, optional
If True, the attributes (``attrs``) will be copied from the original
object to the new one. If False (default), the new object will be
returned without attributes.

Returns
-------
quantiles : {cls}
New {cls} object with weighted ``quantile`` applied to its data and
the indicated dimension(s) removed.

See Also
--------
numpy.nanquantile, pandas.Series.quantile, Dataset.quantile
DataArray.quantile
cjauvin marked this conversation as resolved.
Show resolved Hide resolved
huard marked this conversation as resolved.
Show resolved Hide resolved

Notes
-----
Returns NaN if the ``weights`` sum to 0.0 along the reduced
dimension(s).
"""


if TYPE_CHECKING:
from .dataarray import DataArray
Expand Down Expand Up @@ -239,6 +290,81 @@ def _weighted_std(

return cast("DataArray", np.sqrt(self._weighted_var(da, dim, skipna)))

def _weighted_quantile(
self,
da: "DataArray",
q,
dim: Optional[Union[Hashable, Iterable[Hashable]]] = None,
huard marked this conversation as resolved.
Show resolved Hide resolved
skipna: Optional[bool] = None,
keep_attrs: Optional[bool] = None,
) -> "DataArray":
"""Apply a weighted ``quantile`` to a DataArray along some dimension(s)."""

def _weighted_quantile_type7_1d(a, weights, q, skipna):
cjauvin marked this conversation as resolved.
Show resolved Hide resolved
# This algorithm has been adapted from:
# https://aakinshin.net/posts/weighted-quantiles/#reference-implementation
if skipna:
# Remove nans from a and weights
not_nan = ~np.isnan(a)
a = a[not_nan]
weights = weights[not_nan]
elif np.isnan(a).any():
return np.full(len(q), np.nan)
# Flatten input values because this function is 1d
a = a.ravel()
weights = weights.ravel()
n = a.size
assert n == len(weights)
Copy link
Collaborator

@mathause mathause Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this check outside of _weighted_quantile_type7_1d? Also you should not use assert (outside of tests) but

if n != len(weights):
    raise ValueError("Message")

weights = weights / weights.sum()
sorter = np.argsort(a)
dcherian marked this conversation as resolved.
Show resolved Hide resolved
a = a[sorter]
weights = weights[sorter]
weights_cum = np.append(0, weights.cumsum())
res = []
for p in q:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of this for loop if you set

Suggested change
res = []
for p in q:
q = np.atleast_2d(q).T

h = p * (n - 1) + 1
u = np.maximum((h - 1) / n, np.minimum(h / n, weights_cum))
v = u * n - h + 1
w = np.diff(v)
res.append(sum(a * w))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful - this uses the built-in sum which is much slower than (a * w).sum() or np.sum(a * w). (use (a * w).sum(axis=1) when you get rid of the for loop).


return np.asarray(res)

scalar = utils.is_scalar(q)
q = np.atleast_1d(np.asarray(q, dtype=np.float64))

if q.ndim > 1:
raise ValueError("q must be a scalar or 1d")

if dim is None:
dim = da.dims

if utils.is_scalar(dim):
dim = [dim]

dim = cast(Sequence, dim)

result = apply_ufunc(
_weighted_quantile_type7_1d,
da,
self.weights,
input_core_dims=[dim, dim],
output_core_dims=[["quantile"]],
output_dtypes=[np.float64],
join="override",
Copy link
Collaborator

@mathause mathause Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use join="inner", here and get rid of the align call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this seems to work.

dask_gufunc_kwargs=dict(output_sizes={"quantile": len(q)}),
dask="parallelized",
vectorize=True,
kwargs={"q": q, "skipna": skipna},
)

# for backward compatibility
result = result.transpose("quantile", ...)
result = result.assign_coords(quantile=q)
if scalar:
result = result.squeeze("quantile")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call squeeze unconditionally.

return result

def _implementation(self, func, dim, **kwargs):

raise NotImplementedError("Use `Dataset.weighted` or `DataArray.weighted`")
Expand Down Expand Up @@ -308,6 +434,18 @@ def std(
self._weighted_std, dim=dim, skipna=skipna, keep_attrs=keep_attrs
)

def quantile(
self,
q,
dim: Optional[Union[Hashable, Iterable[Hashable]]] = None,
huard marked this conversation as resolved.
Show resolved Hide resolved
skipna: Optional[bool] = None,
keep_attrs: Optional[bool] = None,
) -> T_Xarray:

return self._implementation(
self._weighted_quantile, q=q, dim=dim, skipna=skipna, keep_attrs=keep_attrs
)

def __repr__(self):
"""provide a nice str repr of our Weighted object"""

Expand Down Expand Up @@ -358,6 +496,8 @@ def _inject_docstring(cls, cls_name):
cls=cls_name, fcn="std", on_zero="NaN"
)

cls.quantile.__doc__ = _WEIGHTED_QUANTILE_DOCSTRING_TEMPLATE.format(cls=cls_name)


_inject_docstring(DataArrayWeighted, "DataArray")
_inject_docstring(DatasetWeighted, "Dataset")
Loading