Skip to content

Resample with limit/tolerance #2695

Closed
Closed
@observingClouds

Description

@observingClouds

Upsampling methods cannot be limited

It is comes very handy to limit the scope of the resample method e.g. nearest in time series. In pandas the limit argument can be given, such that:

import pandas as pd
import datetime as dt

dates=[dt.datetime(2018,1,1), dt.datetime(2018,1,2)]
data=[10,20]
df=pd.DataFrame(data,index=dates)
df.resample('1H').nearest(limit=1)

This leads to

2018-01-01 00:00:00  10.0
2018-01-01 01:00:00  10.0
2018-01-01 02:00:00   NaN
2018-01-01 03:00:00   NaN
2018-01-01 04:00:00   NaN
...
2018-01-01 20:00:00   NaN
2018-01-01 21:00:00   NaN
2018-01-01 22:00:00   NaN
2018-01-01 23:00:00  20.0
2018-01-02 00:00:00  20.0

Currently:

import xarray as xr
xdf = xr.Dataset.from_dataframe(df)
xdf.resample({'index':'1H'}).nearest(limit=1)

leads to

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: nearest() got an unexpected keyword argument 'limit'

Problem description

This is very helpful, as one might not want to fill gaps with the nearest method indefinitely.
To my understanding the following modifications might be made by comparisions to the pandas code:

/xarray/core/resample.py

    def _upsample(self, method, limit=None, *args, **kwargs):
        ...
        elif method in ['pad', 'ffill', 'backfill', 'bfill', 'nearest']:
            kwargs = kwargs.copy()
            kwargs.update(**{self._dim: upsampled_index})
            return self._obj.reindex(method=method, tolerance=limit, *args, **kwargs)
        ...```

and
```python
   def nearest(self, limit=None):
        """Take new values from nearest original coordinate to up-sampled
        frequency coordinates.
        """
        return self._upsample('nearest',limit=limit)

So I think, with the tolerance keyword, reindex supports already the limit, but it just hasn't been forwarded to the _upsample and nearest methods.

Current Output

import xarray as xr
>>> xdf = xr.Dataset.from_dataframe(df)
>>> xdf.resample({'index':'1H'}).nearest()
<xarray.Dataset>
Dimensions:  (index: 25)
Coordinates:
  * index    (index) datetime64[ns] 2018-01-01 ... 2018-01-02
Data variables:
    0        (index) int64 10 10 10 10 10 10 10 10 ... 20 20 20 20 20 20 20 20

However, it would be nice, if the following would work:

xdf.resample({'index':'1H'}).nearest(limit=1)

<xarray.Dataset>
Dimensions:  (index: 25)
Coordinates:
  * index    (index) datetime64[ns] 2018-01-01 ... 2018-01-02
Data variables:
    0        (index) float64 10.0 10.0 nan nan nan nan ... nan nan nan 20.0 20.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions