-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add average function #422
Comments
Module error checking, etc., this would look something like: def average(self, dim=None, weights=None):
if weights is None:
return self.mean(dim)
else:
return (self * weights).sum(dim) / weights.sum(dim) This is pretty easy to do manually, but I can see the value in having the standard method around, so I'm definitely open to PRs to add this functionality. |
This is has to be adjusted if there are Is there a better way to get the correct weights than:
It should probably not be used on a Dataset as every DataArray may have its own |
Possibly using where, e.g., |
Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray def average_da(self, dim=None, weights=None):
"""
weighted average for DataArrays
Parameters
----------
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of self.
Returns
-------
reduced : DataArray
New DataArray with average applied to its data and the indicated
dimension(s) removed.
"""
if weights is None:
return self.mean(dim)
else:
if not isinstance(weights, xray.DataArray):
raise ValueError("weights must be a DataArray")
# if NaNs are present, we need individual weights
if self.notnull().any():
total_weights = weights.where(self.notnull()).sum(dim=dim)
else:
total_weights = weights.sum(dim)
return (self * weights).sum(dim) / total_weights
# -----------------------------------------------------------------------------
def average_ds(self, dim=None, weights=None):
"""
weighted average for Datasets
Parameters
----------
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of data.
Returns
-------
reduced : Dataset
New Dataset with average applied to its data and the indicated
dimension(s) removed.
"""
if weights is None:
return self.mean(dim)
else:
return self.apply(average_da, dim=dim, weights=weights) They can be combined to one function: def average(data, dim=None, weights=None):
"""
weighted average for xray objects
Parameters
----------
data : Dataset or DataArray
the xray object to average over
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of data.
Returns
-------
reduced : Dataset or DataArray
New xray object with average applied to its data and the indicated
dimension(s) removed.
"""
if isinstance(data, xray.Dataset):
return average_ds(data, dim, weights)
elif isinstance(data, xray.DataArray):
return average_da(data, dim, weights)
else:
raise ValueError("date must be an xray Dataset or DataArray") Or a monkey patch: xray.DataArray.average = average_da
xray.Dataset.average = average_ds |
@MaximilianR has suggested a da.weighted(weights=ds.dim).mean()
# or maybe
da.weighted(time=days_per_month(da.time)).mean() I really like this idea, as does @shoyer. I'm going to close my PR in hopes of this becoming reality. |
I would suggest not using keyword arguments for |
Sounds like a clean solution. Then we can defer handling of NaN in the weights to We may still end up implementing all required methods separately in
i.e. we use
However, I think this can not be generalized to a Additionally, |
Do we want
or
|
I would think you want the latter ( >>> da.shape
(72, 10, 15)
>>> da.dims
('time', 'x', 'y')
>>> weights = some_func_of_time(time)
>>> da.weighted(weights).mean(dim=('time', 'x'))
... |
Yes, +1 for
This is a fair point, I haven't looked in to the details of these implementations yet. But I expect there are still at least a few picks of logic that we will be able to share. |
@mathause can you please comment on the status of this issue? Is there an associated PR somewhere? Thanks! |
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
This code is based on the proposed solution to an xarray issue: pydata/xarray#422 (comment) that was never incorporated into xarray itself. The results should result in correct masking when NaNs are present whereas the previous weighted averaging was resulting in zeros (because xarray sum treats NaNs as zeros). The method for storing cache files of climatologies has also been updated slightly so the new weighted averaging can be used to compute the agregated climatology from individual (typically yearly) climatologies. It is hoped (but I haven't tested) that this new weighted average will also be faster than the previous implementation.
Hi, my research group recently discussed weighted averaging with x-array, and I was wondering if there had been any progress with implementing this? I'd be happy to get involved if help is needed. Thanks! |
Hi, This would be a really nice feature to have. I'd be happy to help too. Thank you |
Found this issue due to @rabernats blogpost. This is a much requested feature in our working group, and it would be great to build onto it in xgcm aswell. |
I have to say that I am still pretty bad at thinking fully object orientented, but is this what we want in general? I like the syntax proposed by @jhamman above, but I am wondering what happens in a slightly modified example:
I think we should maybe build in a warning that when the It was mentioned that the functions on |
hmm.. the intent here would be that the weights are broadcasted against the input array no? Not sure that a warning is required. e.g. @shoyer's comment above:
Are we going to require that the argument to |
Point taken. I am still not thinking general enough :-)
This sounds good to me. With regard to the implementation, I thought of orienting myself along the lines of |
Maybe a bad question, but is there a good jumping off point to gain some familiarity with the code base? It’s admittedly my first time looking at xarray from the inside... |
@pgierz take a look at the "good first issue" label: https://github.com/pydata/xarray/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22 |
@pgierz - Our documentation has a page on contributing which I encourage you to read through. Once you have your local development environment set up and your fork cloned, the next step is to start exploring the source code and figuring out where changes need to be made. At that point, you can post any questions you have here and we will be happy to give you some guidance. |
Can the stats functions from https://esmlab.readthedocs.io/en/latest/api.html#statistics-functions be used? |
I would do the same i.e. take inspiration from the groupby / rolling / resample modules. |
It would be nice to be able to do
ds.average()
to compute weighted averages (e.g. for geo data). Of course this would require the axes to be in a predictable order. Or to give a weight per dimension...The text was updated successfully, but these errors were encountered: