Skip to content

Commit

Permalink
Implementation of polyfit and polyval (#3733)
Browse files Browse the repository at this point in the history
* [WIP] Implementation of polyfit and polyval - minimum testing - no docs

* Formatting with black, flake8

* Fix failing test

* More intelligent skipna switching

* Add docs | Change coeff order to fit numpy | move polyval

* Move doc patching to class

* conditional doc patching

* Fix windows fail - more efficient nan skipping

* Fix typo in least_squares

* Move polyfit to dataset

* Add more tests | fix some edge cases

* Skip test without dask

* Fix 1D case | add docs

* skip polyval test without dask

* Explicit docs | More restrictive polyval

* Small typo in polyfit docstrings

* Apply suggestions from code review

Co-Authored-By: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* Polyfit : fix style in docstring | add see also section

* Clean up docstrings and documentation.

* Move whats new entry to 0.16 | fix PEP8 issue in test_dataarray

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
  • Loading branch information
aulemahal and max-sixty authored Mar 25, 2020
1 parent f583ac7 commit ec215da
Show file tree
Hide file tree
Showing 14 changed files with 488 additions and 1 deletion.
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Top-level functions
zeros_like
ones_like
dot
polyval
map_blocks
show_versions
set_options
Expand Down Expand Up @@ -172,6 +173,7 @@ Computation
Dataset.quantile
Dataset.differentiate
Dataset.integrate
Dataset.polyfit

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -352,6 +354,7 @@ Computation
DataArray.quantile
DataArray.differentiate
DataArray.integrate
DataArray.polyfit
DataArray.str

**Aggregation**:
Expand Down
26 changes: 26 additions & 0 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,32 @@ trapezoidal rule using their coordinates,
and integration along multidimensional coordinate are not supported.


.. _compute.polyfit:

Fitting polynomials
===================

Xarray objects provide an interface for performing linear or polynomial regressions
using the least-squares method. :py:meth:`~xarray.DataArray.polyfit` computes the
best fitting coefficients along a given dimension and for a given order,

.. ipython:: python
x = xr.DataArray(np.arange(10), dims=['x'], name='x')
a = xr.DataArray(3 + 4 * x, dims=['x'], coords={'x': x})
out = a.polyfit(dim='x', deg=1, full=True)
out
The method outputs a dataset containing the coefficients (and more if `full=True`).
The inverse operation is done with :py:meth:`~xarray.polyval`,

.. ipython:: python
xr.polyval(coord=x, coeffs=out.polyfit_coefficients)
.. note::
These methods replicate the behaviour of :py:func:`numpy.polyfit` and :py:func:`numpy.polyval`.

.. _compute.broadcasting:

Broadcasting by dimension name
Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Breaking changes

New Features
~~~~~~~~~~~~
- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting polynomials. (:issue:`3349`)
By `Pascal Bourgault <https://github.com/aulemahal>`_.
- Control over attributes of result in :py:func:`merge`, :py:func:`concat`,
:py:func:`combine_by_coords` and :py:func:`combine_nested` using
combine_attrs keyword argument. (:issue:`3865`, :pull:`3877`)
Expand Down
3 changes: 2 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from .core.alignment import align, broadcast
from .core.combine import auto_combine, combine_by_coords, combine_nested
from .core.common import ALL_DIMS, full_like, ones_like, zeros_like
from .core.computation import apply_ufunc, dot, where
from .core.computation import apply_ufunc, dot, polyval, where
from .core.concat import concat
from .core.dataarray import DataArray
from .core.dataset import Dataset
Expand Down Expand Up @@ -65,6 +65,7 @@
"open_mfdataset",
"open_rasterio",
"open_zarr",
"polyval",
"register_dataarray_accessor",
"register_dataset_accessor",
"save_mfdataset",
Expand Down
32 changes: 32 additions & 0 deletions xarray/core/computation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1306,3 +1306,35 @@ def where(cond, x, y):
dataset_join="exact",
dask="allowed",
)


def polyval(coord, coeffs, degree_dim="degree"):
"""Evaluate a polynomial at specific values
Parameters
----------
coord : DataArray
The 1D coordinate along which to evaluate the polynomial.
coeffs : DataArray
Coefficients of the polynomials.
degree_dim : str, default "degree"
Name of the polynomial degree dimension in `coeffs`.
See also
--------
xarray.DataArray.polyfit
numpy.polyval
"""
from .dataarray import DataArray
from .missing import get_clean_interp_index

x = get_clean_interp_index(coord, coord.name)

deg_coord = coeffs[degree_dim]

lhs = DataArray(
np.vander(x, int(deg_coord.max()) + 1),
dims=(coord.name, degree_dim),
coords={coord.name: coord, degree_dim: np.arange(deg_coord.max() + 1)[::-1]},
)
return (lhs * coeffs).sum(degree_dim)
27 changes: 27 additions & 0 deletions xarray/core/dask_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,30 @@ def func(x, window, axis=-1):
# crop boundary.
index = (slice(None),) * axis + (slice(drop_size, drop_size + orig_shape[axis]),)
return out[index]


def least_squares(lhs, rhs, rcond=None, skipna=False):
import dask.array as da

lhs_da = da.from_array(lhs, chunks=(rhs.chunks[0], lhs.shape[1]))
if skipna:
added_dim = rhs.ndim == 1
if added_dim:
rhs = rhs.reshape(rhs.shape[0], 1)
results = da.apply_along_axis(
nputils._nanpolyfit_1d,
0,
rhs,
lhs_da,
dtype=float,
shape=(lhs.shape[1] + 1,),
rcond=rcond,
)
coeffs = results[:-1, ...]
residuals = results[-1, ...]
if added_dim:
coeffs = coeffs.reshape(coeffs.shape[0])
residuals = residuals.reshape(residuals.shape[0])
else:
coeffs, residuals, _, _ = da.linalg.lstsq(lhs_da, rhs)
return coeffs, residuals
62 changes: 62 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -3275,6 +3275,68 @@ def map_blocks(

return map_blocks(func, self, args, kwargs)

def polyfit(
self,
dim: Hashable,
deg: int,
skipna: bool = None,
rcond: float = None,
w: Union[Hashable, Any] = None,
full: bool = False,
cov: bool = False,
):
"""
Least squares polynomial fit.
This replicates the behaviour of `numpy.polyfit` but differs by skipping
invalid values when `skipna = True`.
Parameters
----------
dim : hashable
Coordinate along which to fit the polynomials.
deg : int
Degree of the fitting polynomial.
skipna : bool, optional
If True, removes all invalid values before fitting each 1D slices of the array.
Default is True if data is stored in a dask.array or if there is any
invalid values, False otherwise.
rcond : float, optional
Relative condition number to the fit.
w : Union[Hashable, Any], optional
Weights to apply to the y-coordinate of the sample points.
Can be an array-like object or the name of a coordinate in the dataset.
full : bool, optional
Whether to return the residuals, matrix rank and singular values in addition
to the coefficients.
cov : Union[bool, str], optional
Whether to return to the covariance matrix in addition to the coefficients.
The matrix is not scaled if `cov='unscaled'`.
Returns
-------
polyfit_results : Dataset
A single dataset which contains:
polyfit_coefficients
The coefficients of the best fit.
polyfit_residuals
The residuals of the least-square computation (only included if `full=True`)
[dim]_matrix_rank
The effective rank of the scaled Vandermonde coefficient matrix (only included if `full=True`)
[dim]_singular_value
The singular values of the scaled Vandermonde coefficient matrix (only included if `full=True`)
polyfit_covariance
The covariance matrix of the polynomial coefficient estimates (only included if `full=False` and `cov=True`)
See also
--------
numpy.polyfit
"""
return self._to_temp_dataset().polyfit(
dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov
)

def pad(
self,
pad_width: Mapping[Hashable, Union[int, Tuple[int, int]]] = None,
Expand Down
Loading

0 comments on commit ec215da

Please sign in to comment.