Skip to content

BUG: Adjust time values with Period objects in Series.dt.end_time #18952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 31, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,43 @@ that the dates have been converted to UTC
.. ipython:: python
pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True)

.. _whatsnew_0240.api_breaking.period_end_time:

Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The time values in :class:`Period` and :class:`PeriodIndex` objects are now set
to '23:59:59.999999999' when calling :attr:`Series.dt.end_time`, :attr:`Period.end_time`,
:attr:`PeriodIndex.end_time`, :func:`Period.to_timestamp()` with ``how='end'``,
or :func:`PeriodIndex.to_timestamp()` with ``how='end'`` (:issue:`17157`)

Previous Behavior:

.. code-block:: ipython

In [2]: p = pd.Period('2017-01-01', 'D')
In [3]: pi = pd.PeriodIndex([p])

In [4]: pd.Series(pi).dt.end_time[0]
Out[4]: Timestamp(2017-01-01 00:00:00)

In [5]: p.end_time
Out[5]: Timestamp(2017-01-01 23:59:59.999999999)

Current Behavior:

Calling :attr:`Series.dt.end_time` will now result in a time of '23:59:59.999999999' as
is the case with :attr:`Period.end_time`, for example

.. ipython:: python

p = pd.Period('2017-01-01', 'D')
pi = pd.PeriodIndex([p])

pd.Series(pi).dt.end_time[0]

p.end_time

.. _whatsnew_0240.api.datetimelike.normalize:

Tick DateOffset Normalize Restrictions
Expand Down
5 changes: 5 additions & 0 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ cdef extern from "../src/datetime/np_datetime.h":
cimport util
from util cimport is_period_object, is_string_object, INT32_MIN

from pandas._libs.tslibs.timedeltas import Timedelta
from timestamps import Timestamp
from timezones cimport is_utc, is_tzlocal, get_dst_info
from timedeltas cimport delta_to_nanoseconds
Expand Down Expand Up @@ -1221,6 +1222,10 @@ cdef class _Period(object):
freq = self._maybe_convert_freq(freq)
how = _validate_end_alias(how)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an assert on how? (e.g. it has to be S/E)? if not can you add one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in _validate_end_alias I think:

def _validate_end_alias(how):
how_dict = {'S': 'S', 'E': 'E',
'START': 'S', 'FINISH': 'E',
'BEGIN': 'S', 'END': 'E'}
how = how_dict.get(str(how).upper())
if how not in set(['S', 'E']):
raise ValueError('How must be one of S or E')
return how

end = how == 'E'
if end:
return (self + 1).to_timestamp(how='start') - Timedelta(1, 'ns')

if freq is None:
base, mult = get_freq_code(self.freq)
freq = get_to_timestamp_base(base)
Expand Down
2 changes: 0 additions & 2 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1235,11 +1235,9 @@ def _generate_regular_range(cls, start, end, periods, freq):
tz = None
if isinstance(start, Timestamp):
tz = start.tz
start = start.to_pydatetime()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion to_pydatetime seemed to be causing several tests in tests/test_resample.py to fail because the nanosecond precision was lost. When I removed this it fixed the problem and didn't seem to break any other tests locally.


if isinstance(end, Timestamp):
tz = end.tz
end = end.to_pydatetime()

xdr = generate_range(start=start, end=end,
periods=periods, offset=freq)
Expand Down
12 changes: 11 additions & 1 deletion pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from pandas.core.tools.datetimes import parse_time_string

from pandas._libs.lib import infer_dtype
from pandas._libs import tslib, index as libindex
from pandas._libs import tslib, index as libindex, Timedelta
from pandas._libs.tslibs.period import (Period, IncompatibleFrequency,
DIFFERENT_FREQ_INDEX,
_validate_end_alias)
Expand Down Expand Up @@ -501,6 +501,16 @@ def to_timestamp(self, freq=None, how='start'):
"""
how = _validate_end_alias(how)

end = how == 'E'
if end:
if freq == 'B':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason freq='B' had to be handled differently because it was causing a lot of resample tests to fail when handled in the same way as all the other freq values. Not sure if this is a bug or if there is a better way of handling this.

# roll forward to ensure we land on B date
adjust = Timedelta(1, 'D') - Timedelta(1, 'ns')
return self.to_timestamp(how='start') + adjust
else:
adjust = Timedelta(1, 'ns')
return (self + 1).to_timestamp(how='start') - adjust

if freq is None:
base, mult = _gfc(self.freq)
freq = frequencies.get_to_timestamp_base(base)
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/frame/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pandas as pd
import pandas.util.testing as tm
from pandas import (PeriodIndex, period_range, DataFrame, date_range,
Index, to_datetime, DatetimeIndex)
Index, to_datetime, DatetimeIndex, Timedelta)


def _permute(obj):
Expand Down Expand Up @@ -51,6 +51,7 @@ def test_frame_to_time_stamp(self):
df['mix'] = 'a'

exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC')
exp_index = exp_index + Timedelta(1, 'D') - Timedelta(1, 'ns')
result = df.to_timestamp('D', 'end')
tm.assert_index_equal(result.index, exp_index)
tm.assert_numpy_array_equal(result.values, df.values)
Expand All @@ -66,22 +67,26 @@ def _get_with_delta(delta, freq='A-DEC'):
delta = timedelta(hours=23)
result = df.to_timestamp('H', 'end')
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'h') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

delta = timedelta(hours=23, minutes=59)
result = df.to_timestamp('T', 'end')
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'm') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

result = df.to_timestamp('S', 'end')
delta = timedelta(hours=23, minutes=59, seconds=59)
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 's') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

# columns
df = df.T

exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC')
exp_index = exp_index + Timedelta(1, 'D') - Timedelta(1, 'ns')
result = df.to_timestamp('D', 'end', axis=1)
tm.assert_index_equal(result.columns, exp_index)
tm.assert_numpy_array_equal(result.values, df.values)
Expand All @@ -93,16 +98,19 @@ def _get_with_delta(delta, freq='A-DEC'):
delta = timedelta(hours=23)
result = df.to_timestamp('H', 'end', axis=1)
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'h') - Timedelta(1, 'ns')
tm.assert_index_equal(result.columns, exp_index)

delta = timedelta(hours=23, minutes=59)
result = df.to_timestamp('T', 'end', axis=1)
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'm') - Timedelta(1, 'ns')
tm.assert_index_equal(result.columns, exp_index)

result = df.to_timestamp('S', 'end', axis=1)
delta = timedelta(hours=23, minutes=59, seconds=59)
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 's') - Timedelta(1, 'ns')
tm.assert_index_equal(result.columns, exp_index)

# invalid axis
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexes/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,19 @@ def test_periods_number_check(self):
with pytest.raises(ValueError):
period_range('2011-1-1', '2012-1-1', 'B')

def test_start_time(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number reference

# GH 17157
index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31')
expected_index = date_range('2016-01-01', end='2016-05-31', freq='MS')
tm.assert_index_equal(index.start_time, expected_index)

def test_end_time(self):
# GH 17157
index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31')
expected_index = date_range('2016-01-01', end='2016-05-31', freq='M')
expected_index = expected_index.shift(1, freq='D').shift(-1, freq='ns')
tm.assert_index_equal(index.end_time, expected_index)

def test_index_duplicate_periods(self):
# monotonic
idx = PeriodIndex([2000, 2007, 2007, 2009, 2009], freq='A-JUN')
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexes/period/test_scalar_compat.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
"""Tests for PeriodIndex behaving like a vectorized Period scalar"""

from pandas import PeriodIndex, date_range
from pandas import PeriodIndex, date_range, Timedelta
import pandas.util.testing as tm


Expand All @@ -14,4 +14,5 @@ def test_start_time(self):
def test_end_time(self):
index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31')
expected_index = date_range('2016-01-01', end='2016-05-31', freq='M')
expected_index += Timedelta(1, 'D') - Timedelta(1, 'ns')
tm.assert_index_equal(index.end_time, expected_index)
11 changes: 11 additions & 0 deletions pandas/tests/indexes/period/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import pytest

import pandas as pd
from pandas import Timedelta
import pandas.util.testing as tm
import pandas.core.indexes.period as period
from pandas.compat import lrange
Expand Down Expand Up @@ -60,6 +61,7 @@ def test_to_timestamp(self):

exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC')
result = series.to_timestamp(how='end')
exp_index = exp_index + Timedelta(1, 'D') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)
assert result.name == 'foo'

Expand All @@ -74,16 +76,19 @@ def _get_with_delta(delta, freq='A-DEC'):
delta = timedelta(hours=23)
result = series.to_timestamp('H', 'end')
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'h') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

delta = timedelta(hours=23, minutes=59)
result = series.to_timestamp('T', 'end')
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 'm') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

result = series.to_timestamp('S', 'end')
delta = timedelta(hours=23, minutes=59, seconds=59)
exp_index = _get_with_delta(delta)
exp_index = exp_index + Timedelta(1, 's') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)

index = PeriodIndex(freq='H', start='1/1/2001', end='1/2/2001')
Expand All @@ -92,6 +97,7 @@ def _get_with_delta(delta, freq='A-DEC'):
exp_index = date_range('1/1/2001 00:59:59', end='1/2/2001 00:59:59',
freq='H')
result = series.to_timestamp(how='end')
exp_index = exp_index + Timedelta(1, 's') - Timedelta(1, 'ns')
tm.assert_index_equal(result.index, exp_index)
assert result.name == 'foo'

Expand Down Expand Up @@ -284,6 +290,7 @@ def test_to_timestamp_pi_mult(self):
result = idx.to_timestamp(how='E')
expected = DatetimeIndex(['2011-02-28', 'NaT', '2011-03-31'],
name='idx')
expected = expected + Timedelta(1, 'D') - Timedelta(1, 'ns')
tm.assert_index_equal(result, expected)

def test_to_timestamp_pi_combined(self):
Expand All @@ -298,11 +305,13 @@ def test_to_timestamp_pi_combined(self):
expected = DatetimeIndex(['2011-01-02 00:59:59',
'2011-01-03 01:59:59'],
name='idx')
expected = expected + Timedelta(1, 's') - Timedelta(1, 'ns')
tm.assert_index_equal(result, expected)

result = idx.to_timestamp(how='E', freq='H')
expected = DatetimeIndex(['2011-01-02 00:00', '2011-01-03 01:00'],
name='idx')
expected = expected + Timedelta(1, 'h') - Timedelta(1, 'ns')
tm.assert_index_equal(result, expected)

def test_period_astype_to_timestamp(self):
Expand All @@ -312,6 +321,7 @@ def test_period_astype_to_timestamp(self):
tm.assert_index_equal(pi.astype('datetime64[ns]'), exp)

exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'])
exp = exp + Timedelta(1, 'D') - Timedelta(1, 'ns')
tm.assert_index_equal(pi.astype('datetime64[ns]', how='end'), exp)

exp = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01'],
Expand All @@ -321,6 +331,7 @@ def test_period_astype_to_timestamp(self):

exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'],
tz='US/Eastern')
exp = exp + Timedelta(1, 'D') - Timedelta(1, 'ns')
res = pi.astype('datetime64[ns, US/Eastern]', how='end')
tm.assert_index_equal(res, exp)

Expand Down
17 changes: 10 additions & 7 deletions pandas/tests/scalar/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from datetime import datetime, date, timedelta

import pandas as pd
from pandas import Timedelta
import pandas.util.testing as tm
import pandas.core.indexes.period as period
from pandas.compat import text_type, iteritems
Expand Down Expand Up @@ -274,12 +275,14 @@ def test_timestamp_tz_arg_dateutil_from_string(self):

def test_timestamp_mult(self):
p = pd.Period('2011-01', freq='M')
assert p.to_timestamp(how='S') == pd.Timestamp('2011-01-01')
assert p.to_timestamp(how='E') == pd.Timestamp('2011-01-31')
assert p.to_timestamp(how='S') == Timestamp('2011-01-01')
expected = Timestamp('2011-02-01') - Timedelta(1, 'ns')
assert p.to_timestamp(how='E') == expected

p = pd.Period('2011-01', freq='3M')
assert p.to_timestamp(how='S') == pd.Timestamp('2011-01-01')
assert p.to_timestamp(how='E') == pd.Timestamp('2011-03-31')
assert p.to_timestamp(how='S') == Timestamp('2011-01-01')
expected = Timestamp('2011-04-01') - Timedelta(1, 'ns')
assert p.to_timestamp(how='E') == expected

def test_construction(self):
i1 = Period('1/1/2005', freq='M')
Expand Down Expand Up @@ -611,19 +614,19 @@ def _ex(p):
p = Period('1985', freq='A')

result = p.to_timestamp('H', how='end')
expected = datetime(1985, 12, 31, 23)
expected = Timestamp(1986, 1, 1) - Timedelta(1, 'ns')
assert result == expected
result = p.to_timestamp('3H', how='end')
assert result == expected

result = p.to_timestamp('T', how='end')
expected = datetime(1985, 12, 31, 23, 59)
expected = Timestamp(1986, 1, 1) - Timedelta(1, 'ns')
assert result == expected
result = p.to_timestamp('2T', how='end')
assert result == expected

result = p.to_timestamp(how='end')
expected = datetime(1985, 12, 31)
expected = Timestamp(1986, 1, 1) - Timedelta(1, 'ns')
assert result == expected

expected = datetime(1985, 1, 1)
Expand Down
23 changes: 22 additions & 1 deletion pandas/tests/series/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
import pandas as pd
import pandas.util.testing as tm
import pandas.core.indexes.period as period
from pandas import Series, period_range, DataFrame
from pandas import Series, period_range, DataFrame, Period
import pytest


def _permute(obj):
Expand Down Expand Up @@ -167,3 +168,23 @@ def test_truncate(self):
pd.Period('2017-09-02')
])
tm.assert_series_equal(result2, pd.Series([2], index=expected_idx2))

@pytest.mark.parametrize('input_vals', [
[Period('2016-01', freq='M'), Period('2016-02', freq='M')],
[Period('2016-01-01', freq='D'), Period('2016-01-02', freq='D')],
[Period('2016-01-01 00:00:00', freq='H'),
Period('2016-01-01 01:00:00', freq='H')],
[Period('2016-01-01 00:00:00', freq='M'),
Period('2016-01-01 00:01:00', freq='M')],
[Period('2016-01-01 00:00:00', freq='S'),
Period('2016-01-01 00:00:01', freq='S')]
])
def test_end_time_timevalues(self, input_vals):
# GH 17157
# Check that the time part of the Period is adjusted by end_time
# when using the dt accessor on a Series

s = Series(input_vals)
result = s.dt.end_time
expected = s.apply(lambda x: x.end_time)
tm.assert_series_equal(result, expected)
Loading