-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement integrate #2653
Implement integrate #2653
Conversation
xarray/core/dataset.py
Outdated
coord_var, datetime_unit=datetime_unit) | ||
|
||
variables = OrderedDict() | ||
coord_names = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coords_names
should always be a set
, since you pass it into _replace_vars_and_dims
. Actually, this is a great example of a bug mypy would have caught! (see #2655)
xarray/core/dataset.py
Outdated
variables[k] = v | ||
coord_names.append(k) | ||
else: | ||
if (k in self.data_vars and dim in v.dims): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for extra parenthesis here
# Conflicts: # doc/computation.rst
xarray/core/dataset.py
Outdated
@@ -3867,6 +3867,79 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None): | |||
variables[k] = v | |||
return self._replace_vars_and_dims(variables) | |||
|
|||
def integrate(self, dim, datetime_unit=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the default dim=None
do integration over all dimensions?
from .variable import Variable | ||
|
||
if dim not in self.variables and dim not in self.dims: | ||
raise ValueError('Coordinate {} does not exist.'.format(dim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think splitting these checks into two would be a little clearer:
- "cannot integrate over dimension {} because it does not exist" (for
dim not in self.dims
) - "cannot integrate over dimension {} because there is no corresponding coordinate" (for
dim not in self.variables
)
xarray/core/dataset.py
Outdated
coord_var = self[dim].variable | ||
if coord_var.ndim != 1: | ||
raise ValueError('Coordinate {} must be 1 dimensional but is {}' | ||
' dimensional'.format(dim, coord_var.ndim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, maybe:
- "cannot integrate over dimension {} because the corresponding coordinate is not a 1d array along that dimension: it has dimensions {}"
raise ValueError('Coordinate {} does not exist.'.format(dim)) | ||
|
||
coord_var = self[dim].variable | ||
if coord_var.ndim != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is currently not possible due to xarray's data model, but it's a good idea to add this anyways given that we want to change this soon (e.g., see #2405).
I would recommend adjusting this to if coord_var.dims != (dim,)
, which is a little stricter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first thought that it would be nice if we could integrate even along non-dimensional (1d) coordinate (as interpolate_na
, differential
do), but it also sounds something too much.
How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems reasonable to support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, coord
is a better argument rather than dim
?
Or we use dim
for argument but support integration along non-dimensional coordinate with a slight avoidance of correctness, as it is more consistent with other reduction methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, differentiate
uses coord
, so maybe integrate
should too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, +1 for consistency with differentiate
.
xarray/core/dataset.py
Outdated
else: | ||
variables[k] = v | ||
return self._replace_vars_and_dims(variables, | ||
coord_names=set(coord_names)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe define this as a set
instead, and use coord_names.add(k)
instead of append
?
@@ -3867,7 +3867,7 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None): | |||
variables[k] = v | |||
return self._replace_vars_and_dims(variables) | |||
|
|||
def integrate(self, dim, datetime_unit=None): | |||
def integrate(self, coord, datetime_unit=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should coord=None
have the default behavior of integrating over all dimensions? Or would that be confusing in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it would be a little confusing because the result may change depending on which coordinate is used for integrate
, e.g. if the DataArray
has a dimension without coordinate but another one-dimensional coordinate, it is not very clear which should be used.
It would be a little convenient for 1d arrays, but aswe disallow default argument for diff
, I like to disallow default argument here too.
Looks like this is ready to merge? |
Oops, I forgot about this. @fujiisoup thanks for this awesome contribution! We should issue 0.12.0 soon to get this out in the world |
* master: remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741) Reenable cross engine read write netCDF test (pydata#2739) remove bottleneck dev build from travis, this test env was failing to build (pydata#2736) CFTimeIndex Resampling (pydata#2593) add tests for handling of empty pandas objects in constructors (pydata#2735) dropna() for a Series indexed by a CFTimeIndex (pydata#2734) deprecate compat & encoding (pydata#2703) Implement integrate (pydata#2653) ENH: resample methods with tolerance (pydata#2716) improve error message for invalid encoding (pydata#2730) silence a couple of warnings (pydata#2727)
whats-new.rst
for all changes andapi.rst
for new APII would like to add
integrate
, which is essentially an xarray-version ofnp.trapz
.I know there was variety of discussions in #1288, but I think it would be nice to limit us within that numpy provides by
np.trapz
,i.e.,
trapz
notrectangle
orsimps
np.nan
bounds
Most of them (except for 1) can be solved by combining several existing methods.