Closed
Description
Upsampling methods cannot be limited
It is comes very handy to limit the scope of the resample method e.g. nearest
in time series. In pandas the limit
argument can be given, such that:
import pandas as pd
import datetime as dt
dates=[dt.datetime(2018,1,1), dt.datetime(2018,1,2)]
data=[10,20]
df=pd.DataFrame(data,index=dates)
df.resample('1H').nearest(limit=1)
This leads to
2018-01-01 00:00:00 10.0
2018-01-01 01:00:00 10.0
2018-01-01 02:00:00 NaN
2018-01-01 03:00:00 NaN
2018-01-01 04:00:00 NaN
...
2018-01-01 20:00:00 NaN
2018-01-01 21:00:00 NaN
2018-01-01 22:00:00 NaN
2018-01-01 23:00:00 20.0
2018-01-02 00:00:00 20.0
Currently:
import xarray as xr
xdf = xr.Dataset.from_dataframe(df)
xdf.resample({'index':'1H'}).nearest(limit=1)
leads to
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: nearest() got an unexpected keyword argument 'limit'
Problem description
This is very helpful, as one might not want to fill gaps with the nearest
method indefinitely.
To my understanding the following modifications might be made by comparisions to the pandas code:
/xarray/core/resample.py
def _upsample(self, method, limit=None, *args, **kwargs):
...
elif method in ['pad', 'ffill', 'backfill', 'bfill', 'nearest']:
kwargs = kwargs.copy()
kwargs.update(**{self._dim: upsampled_index})
return self._obj.reindex(method=method, tolerance=limit, *args, **kwargs)
...```
and
```python
def nearest(self, limit=None):
"""Take new values from nearest original coordinate to up-sampled
frequency coordinates.
"""
return self._upsample('nearest',limit=limit)
So I think, with the tolerance
keyword, reindex supports already the limit, but it just hasn't been forwarded to the _upsample
and nearest
methods.
Current Output
import xarray as xr
>>> xdf = xr.Dataset.from_dataframe(df)
>>> xdf.resample({'index':'1H'}).nearest()
<xarray.Dataset>
Dimensions: (index: 25)
Coordinates:
* index (index) datetime64[ns] 2018-01-01 ... 2018-01-02
Data variables:
0 (index) int64 10 10 10 10 10 10 10 10 ... 20 20 20 20 20 20 20 20
However, it would be nice, if the following would work:
xdf.resample({'index':'1H'}).nearest(limit=1)
<xarray.Dataset>
Dimensions: (index: 25)
Coordinates:
* index (index) datetime64[ns] 2018-01-01 ... 2018-01-02
Data variables:
0 (index) float64 10.0 10.0 nan nan nan nan ... nan nan nan 20.0 20.0
Metadata
Metadata
Assignees
Labels
No labels