Skip to content

Handling of end in date_range #16354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnothman opened this issue May 15, 2017 · 12 comments
Open

Handling of end in date_range #16354

jnothman opened this issue May 15, 2017 · 12 comments
Labels
Datetime Datetime data dtype Docs Frequency DateOffsets

Comments

@jnothman
Copy link
Contributor

jnothman commented May 15, 2017

The documentation of date_range's end parameter says:

      If periods is none, generated index will extend to first conforming
      time on or just past end argument

I interpret this to mean that either:

  1. the last date in the index will be on or past the given end date, or that
  2. the last period will include the given end date.

The first interpretation is clearly not true.

>>> pandas.date_range(start='2017-01-01', end='2017-01-16', freq='7d')
DatetimeIndex(['2017-01-01', '2017-01-08', '2017-01-15'], dtype='datetime64[ns]', freq='7D')

But it seems like the second is not certainly true either:

>>> pandas.date_range(start='2017-01-01', end='2017-01-16', freq='M')
DatetimeIndex([], dtype='datetime64[ns]', freq='M')
>>> pandas.date_range(start='2017-01-01', end='2017-02-16', freq='M')
DatetimeIndex(['2017-01-31'], dtype='datetime64[ns]', freq='M')

I am not sure if the behaviour with this unequal freq (like M which should extract the beginning of the month) is correct. Is this a bug?

Otherwise the documentation needs clarification.

More generally, it seems like the implementation is inegalitarian between starts and ends: you can't easily construct a DatetimeIndex based on freq which certainly includes end. One approach you could consider is to swap start and end, but without changing freq this produces an empty DatetimeIndex. Negating freq is possible if it has a unit (e.g. -7d) but I don't think there's a way to get get freq='M' backwards.

>>> pandas.date_range(end='2017-01-01', start='2017-02-16', freq='M')
DatetimeIndex([], dtype='datetime64[ns]', freq='M')
>>> pandas.date_range(end='2017-01-01', start='2017-02-16', freq='-M')
Traceback (most recent call last):
  File "/Users/joel/repos/pandas/pandas/tseries/frequencies.py", line 549, in to_offset
    stride = int(stride)
ValueError: invalid literal for int() with base 10: '-'

So three potential sub-issues:

  1. Clarify docs for date_range's end (and related end params).
  2. Check that behaviour regarding end with DateOffsets is intended.
  3. A way to easily make date_range that starts at an endpoint and applies a DateOffset in reverse. (Perhaps this is the same as 2.)
@jorisvandenbossche
Copy link
Member

Some related issues: #15886, #12355

Regarding the explanation of end kwarg, I think the docs are clearly wrong. The docstring of date_range is more correct, it says: "Right bound for generating dates".

Apart from that, your question on how to easily create a DatetimeIndex that certainly includes the end date, is still open (#12355 made a similar request, but what not really discussed)

@jorisvandenbossche jorisvandenbossche added Frequency DateOffsets Datetime Datetime data dtype labels May 15, 2017
@jreback
Copy link
Contributor

jreback commented May 15, 2017

related to #6673 as well.

@jnothman
Copy link
Contributor Author

jnothman commented May 15, 2017 via email

@jorisvandenbossche
Copy link
Member

I agree, closed should say something about whether the exact end date should be included or not, and is not about whether last interval should go up to or beyond the end date (further #6673 is specific to business day I think).

@jreback
Copy link
Contributor

jreback commented May 15, 2017

see #15886 which was closed as a catchall in #6673 or if you prefer you can closed this one, and re-open #15886 this is a duplicate.

@jorisvandenbossche
Copy link
Member

Yeah, I don't know the exact internals enough to assess whether it is the same issue or not .. :-)

@jnothman Regarding your second point:

>>> pandas.DatetimeIndex(start='2017-01-01', end='2017-01-16', freq='M')
DatetimeIndex([], dtype='datetime64[ns]', freq='M')
>>> pandas.DatetimeIndex(start='2017-01-01', end='2017-02-16', freq='M')
DatetimeIndex(['2017-01-31'], dtype='datetime64[ns]', freq='M')

I am not sure if the behaviour with this unequal freq (like M which should extract the beginning of the month) is correct. Is this a bug?

I think this is correct, because 'M' is actually a shorthand for 'month end' (not month start). And so the first month end (January 31) is not included in the start/end range you specify. If you specify month start as the freq explicitly, it is not empty:

In [56]: pd.date_range(start='2017-01-01', end='2017-01-16', freq='MS')
Out[56]: DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq='MS')

@jnothman
Copy link
Contributor Author

I suppose I had expected something more akin to groupby(pd.TimeGrouper('M')) being run over the range of moments between start and end.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 15, 2017

@jnothman You might want periods then:

In [62]: pd.period_range(start='2017-01-01', end='2017-02-16', freq='M')
Out[62]: PeriodIndex(['2017-01', '2017-02'], dtype='period[M]', freq='M')

which does actually go until the month the 'end' belongs to (so to make sure 'end' is included in the range)

@jnothman
Copy link
Contributor Author

jnothman commented May 15, 2017 via email

@jorisvandenbossche
Copy link
Member

but i feel correct behaviour is not clear here regardless

Can you clarify that?

@skeller88
Copy link

+1 on this issue. For example, if I generate the following index, based on the documentation, I would expect the first date in the generated date range to be "start", and the last date in the generated range to be greater than or equal to "end". That's not what's happening. In certain cases, neither parameter is respected:

continuous_index = pd.date_range(start='2014-11-26', end='2017-07-26', freq='W')
print(min(continuous_index), max(continuous_index))
# generates 
(Timestamp('2014-11-30 00:00:00', freq='W-SUN'), Timestamp('2017-07-23 00:00:00', freq='W-SUN'))

@arturomp
Copy link

arturomp commented May 30, 2018

Another +1. Some more on SO: https://stackoverflow.com/questions/37890391/

Seems to me that the only difference between freq=M and freq=MS should be the day of the month, but it's also the output based on the end parameter:

>>> pd.date_range('2016-01', '2016-05', freq='MS', format = "%Y-%m" )
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01'],
              dtype='datetime64[ns]', freq='MS')
>>> pd.date_range('2016-01', '2016-05', freq='M', format = "%Y-%m" )
DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30'], dtype='datetime64[ns]', freq='M')

I also find the interaction with the default closed=None parameter to be unclear when days are included (especially the last example).

>>> pd.date_range('2016-01-01', '2016-05-01', freq='M')
DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30'], dtype='datetime64[ns]', freq='M')
>>> pd.date_range('2016-01-01', '2016-05-01', freq='MS')
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01'],
              dtype='datetime64[ns]', freq='MS')
>>> pd.date_range('2016-01-31', '2016-05-31', freq='M')
DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
               '2016-05-31'],
              dtype='datetime64[ns]', freq='M')
>>> pd.date_range('2016-01-31', '2016-05-31', freq='MS')
DatetimeIndex(['2016-02-01', '2016-03-01', '2016-04-01', '2016-05-01'], dtype='datetime64[ns]', freq='MS')

@mroeschke mroeschke changed the title Handling of end in DatetimeIndex Handling of end in date_range Mar 31, 2020
@mroeschke mroeschke added the Docs label Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

6 participants