Skip to content

Inconsistent behaviour in resample between daily and weekly #5937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dbew opened this issue Jan 14, 2014 · 3 comments
Closed

Inconsistent behaviour in resample between daily and weekly #5937

dbew opened this issue Jan 14, 2014 · 3 comments
Labels
Bug Resample resample method

Comments

@dbew
Copy link
Contributor

dbew commented Jan 14, 2014

Resampling weekly doesn't behave the same way as resampling daily when using label='right'.

import numpy as np
import pandas as pd

dates = pd.date_range('2001-01-01 10:00', '2001-01-15 16:00', freq='12h')
d = pd.DataFrame(dict(A=np.arange(len(dates))), index=dates)

so d looks like this:

                      A
2001-01-01 10:00:00   0
2001-01-01 22:00:00   1
2001-01-02 10:00:00   2
2001-01-02 22:00:00   3
2001-01-03 10:00:00   4
2001-01-03 22:00:00   5
2001-01-04 10:00:00   6
2001-01-04 22:00:00   7
2001-01-05 10:00:00   8
2001-01-05 22:00:00   9
2001-01-06 10:00:00  10
2001-01-06 22:00:00  11
<truncated>

Then we resample daily:

d.resample('D', label='right').last()

Produces the following output.

             A
2001-01-02   1
2001-01-03   3
2001-01-04   5
2001-01-05   7
2001-01-06   9
<truncated>

Note that the value for 2001-01-05 is 7, the last value on 2001-01-04 in the original dataframe. If we resample weekly:

d.resample('W-FRI', label='right').last()

Output is

             A
2001-01-05   9
2001-01-12  23
2001-01-19  29

This time, the value labelled 2001-01-05 is the last value on 2001-01-05 not the last value on 2001-01-04. This is inconsistent with the behaviour for daily resample, where the label is always strictly after the data.

The result I expect is

             A
2001-01-06   9
2001-01-13  23
2001-01-20  29

as the end of a W-FRI bucket should be midnight on the following Saturday. This is consistent with the daily resampling behaviour where the end of the 2001-01-05 bucket is midnight on 2001-01-06.

Output of installed versions below.

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux
Release: 2.6.18-308.el5
Processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.13.0
Cython: Not installed
Numpy: 1.7.1
Scipy: 0.9.0
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2011k
bottleneck: Not installed
PyTables: 2.3.1-1
    numexpr: 2.0.1
matplotlib: 1.1.1
openpyxl: Not installed
xlrd: 0.8.0
xlwt: 0.7.4
xlsxwriter: Not installed
sqlalchemy: 0.8.2
lxml: 2.3.6
bs4: Not installed
html5lib: 0.90
bigquery: Not installed
apiclient: Not installed
@jreback
Copy link
Contributor

jreback commented Jan 14, 2014

looks same / related to #4197, #4076 ?

@dbew
Copy link
Contributor Author

dbew commented Jan 15, 2014

It might be related - I don't know the underlying code - but it's not the same.

#4197 is about creating bins where partial data is available. Their example is downsample to 5 minute data and the last bin has only 3 minutes of data. The question is should that last bin have been created? (There is an example where label='right' should be used but this matches the example above for daily resampling)

#4076 is about extra bins when resampling e.g. you downsample to weekly from daily data, and in some circumstances you get an extra bin at the end which does not correspond to any of the data.

In this issue, the labelling does not correspond to the data included in the weekly bin: the first bin is labelled 2001-01-05 (i.e. midnight between 2001-01-04 and 2001-01-05) but contains data from after this point. So we have forward information in our resampled data.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@MarcoGorelli
Copy link
Member

This is correct

As the documentation says:

closed{‘right’, ‘left’}, default None

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

So, when you resample daily, for 2001-01-05, the last value is 7, as the interval is [2001-01-04,2001-01-05)

When you resample weekly, the last value is 9, as the interval is (2000-12-30 ,2000-01-05]

Closing then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

No branches or pull requests

4 participants