Skip to content

date_range() drops last date for closed monthly ranges #15886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevesimmons opened this issue Apr 4, 2017 · 4 comments
Closed

date_range() drops last date for closed monthly ranges #15886

stevesimmons opened this issue Apr 4, 2017 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request Frequency DateOffsets

Comments

@stevesimmons
Copy link
Contributor

stevesimmons commented Apr 4, 2017

pd.date_range() handles closed intervals correctly for daily time series but incorrectly for monthly.

In this example, notice how freq='D' gives 4 days while freq='M' gives 3 months, dropping the final 2017-07-31:

pd.date_range(start='2017-01-04', end='2017-01-07', freq='D', closed=None)
# DatetimeIndex(['2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07'], dtype='datetime64[ns]', freq='D')

pd.date_range(start='2017-04', end='2017-07', freq='M', closed=None)
# DatetimeIndex(['2017-04-30', '2017-05-31', '2017-06-30'], dtype='datetime64[ns]', freq='M')

This is with pandas 0.19.2. Here is my output for pd.show_versions():

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-71-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

This is what you input

In [2]: freq = pd.offsets.MonthEnd()

In [3]: freq.rollback(Timestamp('2017-07-01'))
Out[3]: Timestamp('2017-06-30 00:00:00')

So the end point is first snapped to a valid end-point.

In [4]: freq.rollback(Timestamp('2017-07-31'))
Out[4]: Timestamp('2017-07-31 00:00:00')

For Tick offsets (daily and less), this is fine. But for snapped offsets I guess this is not the intent.

@chris-b1 thoughts?

@jreback jreback added the Frequency DateOffsets label Apr 4, 2017
@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

this is a duplicate of #6673 (though that is for a different freq). @stevesimmons if you want to have a look there as well.

@chris-b1
Copy link
Contributor

chris-b1 commented Apr 4, 2017

Not sure on this one, like you said in #6673, it's somewhat arbitrary. It seems part of the issue here is that pd.Timestamp('2017-07') == pd.Timestamp('2017-07-01'), which is of course correct, but in the context of a monthly frequency, I can see why someone would expect 2017-07-31.

@jreback jreback added the Duplicate Report Duplicate issue or pull request label May 15, 2017
@jreback jreback added this to the No action milestone May 15, 2017
@jreback jreback closed this as completed May 15, 2017
@jorisvandenbossche
Copy link
Member

@jreback I think this is actually a different issue. #6673 is about the interplay between closed='left' and the fact that the and gets rolled back to the last valid end date for the 'B' frequency.
This problem however does not occur for monthly frequency:

In [57]: pd.date_range(start='2017-04', end='2017-07', freq='M', closed=None)
Out[57]: DatetimeIndex(['2017-04-30', '2017-05-31', '2017-06-30'], dtype='datetime64[ns]', freq='M')

In [58]: pd.date_range(start='2017-04', end='2017-07', freq='M', closed='left')
Out[58]: DatetimeIndex(['2017-04-30', '2017-05-31', '2017-06-30'], dtype='datetime64[ns]', freq='M')

(both are the same)

As noted above, the tricky/surprising part in this issue is that pd.Timestamp('2017-07') == pd.Timestamp('2017-07-01'), while the M frequency wants the end of the month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

4 participants