Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement integrate #2653

Merged
merged 9 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ Computation
Dataset.diff
Dataset.quantile
Dataset.differentiate
Dataset.integrate

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -319,6 +320,7 @@ Computation
DataArray.dot
DataArray.quantile
DataArray.differentiate
DataArray.integrate

**Aggregation**:
:py:attr:`~DataArray.all`
Expand Down
14 changes: 12 additions & 2 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ You can also use ``construct`` to compute a weighted rolling sum:
To avoid this, use ``skipna=False`` as the above example.


.. _compute.using_coordinates:

Computation using Coordinates
=============================

Expand All @@ -220,9 +222,17 @@ This method can be used also for multidimensional arrays,
coords={'x': [0.1, 0.11, 0.2, 0.3]})
a.differentiate('x')

:py:meth:`~xarray.DataArray.integrate` computes integration based on
trapezoidal rule using their coordinates,

.. ipython:: python

a.integrate('x')

.. note::
This method is limited to simple cartesian geometry. Differentiation along
multidimensional coordinate is not supported.
These methods are limited to simple cartesian geometry. Differentiation
and integration along multidimensional coordinate are not supported.


.. _compute.broadcasting:

Expand Down
13 changes: 10 additions & 3 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ Enhancements
as long as the array is not chunked along the resampling dimension.
By `Spencer Clark <https://github.com/spencerkclark>`_.

- :py:meth:`~xarray.DataArray.integrate` and
:py:meth:`~xarray.Dataset.integrate` are newly added.
See :ref:`_compute.using_coordinates` for the detail.
(:issue:`1332`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.


Bug fixes
~~~~~~~~~

Expand Down Expand Up @@ -76,9 +83,9 @@ Breaking changes
- Minimum rasterio version increased from 0.36 to 1.0 (for ``open_rasterio``)
- Time bounds variables are now also decoded according to CF conventions
(:issue:`2565`). The previous behavior was to decode them only if they
had specific time attributes, now these attributes are copied
automatically from the corresponding time coordinate. This might
brake downstream code that was relying on these variables to be
had specific time attributes, now these attributes are copied
automatically from the corresponding time coordinate. This might
break downstream code that was relying on these variables to be
not decoded.
By `Fabien Maussion <https://github.com/fmaussion>`_.

Expand Down
47 changes: 47 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2427,6 +2427,53 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
coord, edge_order, datetime_unit)
return self._from_temp_dataset(ds)

def integrate(self, dim, datetime_unit=None):
""" integrate the array with the trapezoidal rule.

.. note::
This feature is limited to simple cartesian geometry, i.e. coord
must be one dimensional.

Parameters
----------
dim: str
The coordinate to be used to compute the gradient.
datetime_unit
Can be specify the unit if datetime coordinate is specified. One of
{'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps', 'fs',
'as'}

Returns
-------
integrated: DataArray

See also
--------
numpy.trapz: corresponding numpy function

Examples
--------

>>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'],
... coords={'x': [0, 0.1, 1.1, 1.2]})
>>> da
<xarray.DataArray (x: 4, y: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> da.integrate('x')
<xarray.DataArray (y: 3)>
array([5.4, 6.6, 7.8])
Dimensions without coordinates: y
"""
ds = self._to_temp_dataset().integrate(dim, datetime_unit)
return self._from_temp_dataset(ds)


# priority most be higher than Variable to properly work with binary ufuncs
ops.inject_all_ops_and_reduce_methods(DataArray, priority=60)
64 changes: 64 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3866,6 +3866,70 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
variables[k] = v
return self._replace_vars_and_dims(variables)

def integrate(self, dim, datetime_unit=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the default dim=None do integration over all dimensions?

""" integrate the array with the trapezoidal rule.

.. note::
This feature is limited to simple cartesian geometry, i.e. coord
must be one dimensional.

Parameters
----------
dim: str
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
The coordinate to be used to compute the integration.
datetime_unit
Can be specify the unit if datetime coordinate is specified. One of
{'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps', 'fs',
'as'}

Returns
-------
integrated: Dataset

See also
--------
DataArray.integrate
numpy.trapz: corresponding numpy function
"""
from .variable import Variable

if dim not in self.variables and dim not in self.dims:
raise ValueError('Coordinate {} does not exist.'.format(dim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think splitting these checks into two would be a little clearer:

  • "cannot integrate over dimension {} because it does not exist" (for dim not in self.dims)
  • "cannot integrate over dimension {} because there is no corresponding coordinate" (for dim not in self.variables)


coord_var = self[dim].variable
if coord_var.ndim != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is currently not possible due to xarray's data model, but it's a good idea to add this anyways given that we want to change this soon (e.g., see #2405).

I would recommend adjusting this to if coord_var.dims != (dim,), which is a little stricter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought that it would be nice if we could integrate even along non-dimensional (1d) coordinate (as interpolate_na, differential do), but it also sounds something too much.
How do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems reasonable to support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, coord is a better argument rather than dim?
Or we use dim for argument but support integration along non-dimensional coordinate with a slight avoidance of correctness, as it is more consistent with other reduction methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, differentiate uses coord, so maybe integrate should too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, +1 for consistency with differentiate.

raise ValueError('Coordinate {} must be 1 dimensional but is {}'
' dimensional'.format(dim, coord_var.ndim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, maybe:

  • "cannot integrate over dimension {} because the corresponding coordinate is not a 1d array along that dimension: it has dimensions {}"


dim = coord_var.dims[0]
if _contains_datetime_like_objects(coord_var):
if coord_var.dtype.kind in 'mM' and datetime_unit is None:
datetime_unit, _ = np.datetime_data(coord_var.dtype)
elif datetime_unit is None:
datetime_unit = 's' # Default to seconds for cftime objects
coord_var = datetime_to_numeric(
coord_var, datetime_unit=datetime_unit)

variables = OrderedDict()
coord_names = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coords_names should always be a set, since you pass it into _replace_vars_and_dims. Actually, this is a great example of a bug mypy would have caught! (see #2655)

for k, v in self.variables.items():
if k in self.coords:
if dim not in v.dims:
variables[k] = v
coord_names.append(k)
else:
if (k in self.data_vars and dim in v.dims):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for extra parenthesis here

if _contains_datetime_like_objects(v):
v = datetime_to_numeric(v, datetime_unit=datetime_unit)
integ = duck_array_ops.trapz(
v.data, coord_var.data, axis=v.get_axis_num(dim))
v_dims = list(v.dims)
v_dims.remove(dim)
variables[k] = Variable(v_dims, integ)
else:
variables[k] = v
return self._replace_vars_and_dims(variables, coord_names=coord_names)

@property
def real(self):
return self._unary_op(lambda x: x.real,
Expand Down
12 changes: 12 additions & 0 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,18 @@ def gradient(x, coord, axis, edge_order):
return npcompat.gradient(x, coord, axis=axis, edge_order=edge_order)


def trapz(y, x, axis):
if axis < 0:
axis = y.ndim + axis
x_sl1 = (slice(1, None), ) + (None, ) * (y.ndim - axis - 1)
x_sl2 = (slice(None, -1), ) + (None, ) * (y.ndim - axis - 1)
slice1 = (slice(None),) * axis + (slice(1, None), )
slice2 = (slice(None),) * axis + (slice(None, -1), )
dx = (x[x_sl1] - x[x_sl2])
integrand = dx * 0.5 * (y[tuple(slice1)] + y[tuple(slice2)])
return sum(integrand, axis=axis, skipna=False)


masked_invalid = _dask_or_eager_func(
'masked_invalid', eager_module=np.ma,
dask_module=getattr(dask_array, 'ma', None))
Expand Down
75 changes: 75 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -4660,3 +4660,78 @@ def test_differentiate_cftime(dask):
# Test the differentiation of datetimes themselves
actual = da['time'].differentiate('time', edge_order=1, datetime_unit='D')
assert_allclose(actual, xr.ones_like(da['time']).astype(float))


@pytest.mark.parametrize('dask', [True, False])
def test_integrate(dask):
rs = np.random.RandomState(42)
coord = [0.2, 0.35, 0.4, 0.6, 0.7, 0.75, 0.76, 0.8]

da = xr.DataArray(rs.randn(8, 6), dims=['x', 'y'],
coords={'x': coord, 'x2': (('x', ), rs.randn(8)),
'z': 3, 'x2d': (('x', 'y'), rs.randn(8, 6))})
if dask and has_dask:
da = da.chunk({'x': 4})

ds = xr.Dataset({'var': da})

# along x
actual = da.integrate('x')
# coordinate that contains x should be dropped.
expected_x = xr.DataArray(
np.trapz(da, da['x'], axis=0), dims=['y'],
coords={k: v for k, v in da.coords.items() if 'x' not in v.dims})
assert_allclose(expected_x, actual.compute())
assert_equal(ds['var'].integrate('x'), ds.integrate('x')['var'])

# make sure result is also a dask array (if the source is dask array)
assert isinstance(actual.data, type(da.data))

# along y
actual = da.integrate('y')
expected_y = xr.DataArray(
np.trapz(da, da['y'], axis=1), dims=['x'],
coords={k: v for k, v in da.coords.items() if 'y' not in v.dims})
assert_allclose(expected_y, actual.compute())
assert_equal(actual, ds.integrate('y')['var'])
assert_equal(ds['var'].integrate('y'), ds.integrate('y')['var'])

with pytest.raises(ValueError):
da.integrate('x2d')


@pytest.mark.parametrize('dask', [True, False])
@pytest.mark.parametrize('which_datetime', ['np', 'cftime'])
def test_trapz_datetime(dask, which_datetime):
rs = np.random.RandomState(42)
if which_datetime == 'np':
coord = np.array(
['2004-07-13', '2006-01-13', '2010-08-13', '2010-09-13',
'2010-10-11', '2010-12-13', '2011-02-13', '2012-08-13'],
dtype='datetime64')
else:
if not has_cftime:
pytest.skip('Test requires cftime.')
coord = xr.cftime_range('2000', periods=8, freq='2D')

da = xr.DataArray(
rs.randn(8, 6),
coords={'time': coord, 'z': 3, 't2d': (('time', 'y'), rs.randn(8, 6))},
dims=['time', 'y'])

if dask and has_dask:
da = da.chunk({'time': 4})

actual = da.integrate('time', datetime_unit='D')
expected_data = np.trapz(
da, utils.datetime_to_numeric(da['time'], datetime_unit='D'), axis=0)
expected = xr.DataArray(
expected_data, dims=['y'],
coords={k: v for k, v in da.coords.items() if 'time' not in v.dims})
assert_allclose(expected, actual.compute())

# make sure result is also a dask array (if the source is dask array)
assert isinstance(actual.data, type(da.data))

actual2 = da.integrate('time', datetime_unit='h')
assert_allclose(actual, actual2 / 24.0)