Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement integrate #2653

Merged
merged 9 commits into from
Jan 31, 2019
Merged

Implement integrate #2653

merged 9 commits into from
Jan 31, 2019

Conversation

fujiisoup
Copy link
Member

I would like to add integrate, which is essentially an xarray-version of np.trapz.
I know there was variety of discussions in #1288, but I think it would be nice to limit us within that numpy provides by np.trapz,
i.e.,

  1. only for trapz not rectangle or simps
  2. do not care np.nan
  3. do not support bounds
    Most of them (except for 1) can be solved by combining several existing methods.

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
coord_var, datetime_unit=datetime_unit)

variables = OrderedDict()
coord_names = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coords_names should always be a set, since you pass it into _replace_vars_and_dims. Actually, this is a great example of a bug mypy would have caught! (see #2655)

variables[k] = v
coord_names.append(k)
else:
if (k in self.data_vars and dim in v.dims):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for extra parenthesis here

xarray/core/dataset.py Outdated Show resolved Hide resolved
@@ -3867,6 +3867,79 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
variables[k] = v
return self._replace_vars_and_dims(variables)

def integrate(self, dim, datetime_unit=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the default dim=None do integration over all dimensions?

from .variable import Variable

if dim not in self.variables and dim not in self.dims:
raise ValueError('Coordinate {} does not exist.'.format(dim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think splitting these checks into two would be a little clearer:

  • "cannot integrate over dimension {} because it does not exist" (for dim not in self.dims)
  • "cannot integrate over dimension {} because there is no corresponding coordinate" (for dim not in self.variables)

coord_var = self[dim].variable
if coord_var.ndim != 1:
raise ValueError('Coordinate {} must be 1 dimensional but is {}'
' dimensional'.format(dim, coord_var.ndim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, maybe:

  • "cannot integrate over dimension {} because the corresponding coordinate is not a 1d array along that dimension: it has dimensions {}"

raise ValueError('Coordinate {} does not exist.'.format(dim))

coord_var = self[dim].variable
if coord_var.ndim != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is currently not possible due to xarray's data model, but it's a good idea to add this anyways given that we want to change this soon (e.g., see #2405).

I would recommend adjusting this to if coord_var.dims != (dim,), which is a little stricter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought that it would be nice if we could integrate even along non-dimensional (1d) coordinate (as interpolate_na, differential do), but it also sounds something too much.
How do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems reasonable to support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, coord is a better argument rather than dim?
Or we use dim for argument but support integration along non-dimensional coordinate with a slight avoidance of correctness, as it is more consistent with other reduction methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, differentiate uses coord, so maybe integrate should too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, +1 for consistency with differentiate.

else:
variables[k] = v
return self._replace_vars_and_dims(variables,
coord_names=set(coord_names))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe define this as a set instead, and use coord_names.add(k) instead of append?

@max-sixty max-sixty changed the title Imprement integrate Implement integrate Jan 8, 2019
@@ -3867,7 +3867,7 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
variables[k] = v
return self._replace_vars_and_dims(variables)

def integrate(self, dim, datetime_unit=None):
def integrate(self, coord, datetime_unit=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should coord=None have the default behavior of integrating over all dimensions? Or would that be confusing in some way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it would be a little confusing because the result may change depending on which coordinate is used for integrate, e.g. if the DataArray has a dimension without coordinate but another one-dimensional coordinate, it is not very clear which should be used.

It would be a little convenient for 1d arrays, but aswe disallow default argument for diff, I like to disallow default argument here too.

@dcherian
Copy link
Contributor

Looks like this is ready to merge?

@shoyer shoyer merged commit 4923039 into pydata:master Jan 31, 2019
@shoyer
Copy link
Member

shoyer commented Jan 31, 2019

Oops, I forgot about this.

@fujiisoup thanks for this awesome contribution! We should issue 0.12.0 soon to get this out in the world

dcherian pushed a commit to yohai/xarray that referenced this pull request Feb 4, 2019
* master:
  remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741)
  Reenable cross engine read write netCDF test (pydata#2739)
  remove bottleneck dev build from travis, this test env was failing to build (pydata#2736)
  CFTimeIndex Resampling (pydata#2593)
  add tests for handling of empty pandas objects in constructors (pydata#2735)
  dropna() for a Series indexed by a CFTimeIndex (pydata#2734)
  deprecate compat & encoding (pydata#2703)
  Implement integrate (pydata#2653)
  ENH: resample methods with tolerance (pydata#2716)
  improve error message for invalid encoding (pydata#2730)
  silence a couple of warnings (pydata#2727)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants