BUG: Adjust time values with Period objects in Series.dt.end_time #18952

reidy-p · 2017-12-27T01:04:56Z

closes BUG: pd.Series.dt.end_time when values are pd.Period objects are producing different results #17157
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2017-12-27T02:08:15Z

pandas/core/indexes/period.py

@@ -645,7 +645,8 @@ def start_time(self):

    @property
    def end_time(self):
-        return self.to_timestamp(how='end')
+        data = (self + 1).start_time.values.astype(int) - 1


this is reaching into the internals too much
is to_timestakp working? (is it tested)?
fix should be there

Ok I've just pushed a new commit where I tried to make the changes in to_timestamp rather than in end_time.

It seems to work as expected but is causing a few tests to fail because the tests expected the time part to be 00:00:00. For example:

In [1]: pi = pd.PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M') In [2]: exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31']) In [3]: pi.astype('datetime64[ns]', how='end')[0] Out[3]: 2011-01-31 23:59:59.999999999 In [4]: exp[0] Out[4]: 2011-01-31 00:00:00

Previously these were equal but now [3] has a time of 23:59:59.999999999 and [4] has a time of 00:00:00.

jreback · 2017-12-27T02:08:44Z

pandas/tests/indexes/period/test_period.py

+         Period('2016-01-01 00:01:00', freq='M')],
+        [Period('2016-01-01 00:00:00', freq='S'),
+         Period('2016-01-01 00:00:01', freq='S')]
+    ])


test should be with series tests

codecov · 2017-12-29T22:09:09Z

Codecov Report

Merging #18952 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18952      +/-   ##
==========================================
+ Coverage   92.07%   92.07%   +<.01%     
==========================================
  Files         170      170              
  Lines       50688    50693       +5     
==========================================
+ Hits        46671    46676       +5     
  Misses       4017     4017

Flag	Coverage Δ
#multiple	`90.48% <100%> (ø)`	⬆️
#single	`42.3% <11.11%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimes.py	`95.44% <ø> (-0.03%)`	⬇️
pandas/tseries/offsets.py	`97.15% <100%> (ø)`	⬆️
pandas/core/indexes/period.py	`93.5% <100%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c272c52...a6716ca. Read the comment docs.

jreback · 2017-12-30T15:03:35Z

doc/source/whatsnew/v0.23.0.txt

@@ -294,7 +294,7 @@ Conversion
 - Bug in :meth:`DatetimeIndex.astype` when converting between timezone aware dtypes, and converting from timezone aware to naive (:issue:`18951`)
 - Bug in :class:`FY5253` where ``datetime`` addition and subtraction incremented incorrectly for dates on the year-end but not normalized to midnight (:issue:`18854`)
 - Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18849`)
-
+- Bug in :func:`Series.dt.end_time` where time values in ``Period`` objects were not adjusted (:issue:`17157`)


and PeriodIndex.end_time

we will need a sub-section showing this I think.

And if we're making the change in to_timestamp(how='end') instead of in dt.end_time (which then calls to_timestamp(how='end')) then I will also include to_timestamp(how='end') in the sub-section

jreback · 2017-12-30T15:08:26Z

pandas/core/indexes/period.py

+            new_data._values[indexer] += 1
+            new_data = period.periodarr_to_dt64arr(new_data._values, base)
+            # subtract one nanosecond
+            new_data[indexer] -= 1


I think you can just construct the DTI, then - pd.Timedelta(1, 'ns')

note that the display is not correct here

In [23]: p = pd.Period('20170101', 'D') In [24]: p Out[24]: Period('2017-01-01', 'D') In [25]: pi = pd.Index([p]) In [26]: pi Out[26]: PeriodIndex(['2017-01-01'], dtype='period[D]', freq='D') In [27]: (pi + 1).to_timestamp(how='start') - pd.Timedelta(1, 'ns') Out[27]: DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq=None) In [28]: ((pi + 1).to_timestamp(how='start') - pd.Timedelta(1, 'ns')).values Out[28]: array(['2017-01-01T23:59:59.999999999'], dtype='datetime64[ns]')

So should the DatetimeIndex show the time value like below when to_timestamp(how='end') is called?

In [3]: pi.to_timestamp(how='end') Out[3]: DatetimeIndex(['2017-01-01 23:59:59.999999999'], dtype='datetime64[ns]', freq=None)

yes, this is a separate bug I think.

reidy-p · 2018-01-06T14:49:21Z

Ok I made some updates.

There are still a couple of tests failing that I need to spend some more time on. Some tests in tests/test_resample.py are failing because of problems with asfreq('B', 'ffill') or asfreq('M', 'ffill') but other offset aliases seem to be ok. For example, result and expected are no longer equal:

In [2]: ts = pd.Series([1.1, 1.2, 1.3, 1.4, 1.5], index=pd.period_range('1/1/1990', freq='W-MON', periods=5))            
    
In [3]: result = ts.resample('B', convention='end').ffill()
In [4]: result
Out[4]:
1990-01-01    1.1
1990-01-02    1.1
1990-01-03    1.1
1990-01-04    1.1
1990-01-05    1.1
1990-01-08    1.2
1990-01-09    1.2
1990-01-10    1.2
1990-01-11    1.2
1990-01-12    1.2
1990-01-15    1.3
1990-01-16    1.3
1990-01-17    1.3
1990-01-18    1.3
1990-01-19    1.3
1990-01-22    1.4
1990-01-23    1.4
1990-01-24    1.4
1990-01-25    1.4
1990-01-26    1.4
1990-01-29    1.5
Freq: B, dtype: float64

In [5]: expected = result.to_timestamp('B', how='end').asfreq('B', 'ffill').to_period()
In [6]: expected
Out[6]:
/Users/paul/Desktop/pandas-reidy-p/pandas/core/indexes/datetimes.py:581: 
UserWarning: Discarding nonzero nanoseconds in conversion
  index = _generate_regular_range(start, end, periods, offset)

1990-01-01    NaN
1990-01-02    1.1
1990-01-03    1.1
1990-01-04    1.1
1990-01-05    1.1
1990-01-08    1.1
1990-01-09    1.2
1990-01-10    1.2
1990-01-11    1.2
1990-01-12    1.2
1990-01-15    1.2
1990-01-16    1.3
1990-01-17    1.3
1990-01-18    1.3
1990-01-19    1.3
1990-01-22    1.3
1990-01-23    1.4
1990-01-24    1.4
1990-01-25    1.4
1990-01-26    1.4
1990-01-29    1.4
Freq: B, dtype: float64

I think the reason the two results above are not equal is that asfreq creates a new DatetimeIndex which then later calls to_pydatetime() to convert the Timestamps to native Python datetime objects. This leads to a loss of nanosecond precision as shown in the warning. I don't think this loss of precision occurs for resample.

pep8speaks · 2018-03-12T23:15:20Z

Hello @reidy-p! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 30, 2018 at 20:08 Hours UTC

reidy-p · 2018-03-12T23:28:57Z

pandas/core/indexes/period.py

@@ -674,6 +674,15 @@ def to_timestamp(self, freq=None, how='start'):
        """
        how = _validate_end_alias(how)

+        end = how == 'E'
+        if end:
+            if freq == 'B':


For some reason freq='B' had to be handled differently because it was causing a lot of resample tests to fail when handled in the same way as all the other freq values. Not sure if this is a bug or if there is a better way of handling this.

jreback · 2018-03-13T22:57:43Z

doc/source/whatsnew/v0.23.0.txt

+.. _whatsnew_0230.api_breaking.end_time:
+
+Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


needs to be the same length as the text

jreback · 2018-03-13T22:58:00Z

doc/source/whatsnew/v0.23.0.txt

+Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The time values in ``Period`` and ``PeriodIndex`` objects are now adjusted


add :class: for these

jreback · 2018-03-13T22:58:28Z

doc/source/whatsnew/v0.23.0.txt

+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The time values in ``Period`` and ``PeriodIndex`` objects are now adjusted
+appropriately when calling :attr:`Series.dt.end_time`, :attr:`Period.end_time`,


can you say what was adjusted? IOW in words what a user would want to know about this.

jreback · 2018-03-13T22:59:46Z

doc/source/whatsnew/v0.23.0.txt

+
+   In [2]: p = pd.Period('2017-01-01', 'D')
+   In [3]: pi = pd.PeriodIndex([p])
+


I don't think you need to show all of these, maybe [6] and [5] are enough (e.g. a scalar and array one), same for below

can you address this

jreback · 2018-03-13T23:00:17Z

pandas/_libs/tslibs/period.pyx

@@ -1195,6 +1196,10 @@ cdef class _Period(object):
            freq = self._maybe_convert_freq(freq)
        how = _validate_end_alias(how)



do we have an assert on how? (e.g. it has to be S/E)? if not can you add one

This is done in _validate_end_alias I think:

pandas/pandas/_libs/tslibs/period.pyx

Lines 1843 to 1850 in ad50b1d

def _validate_end_alias(how):

how_dict = {'S': 'S', 'E': 'E',

'START': 'S', 'FINISH': 'E',

'BEGIN': 'S', 'END': 'E'}

how = how_dict.get(str(how).upper())

if how not in set(['S', 'E']):

raise ValueError('How must be one of S or E')

return how

jreback · 2018-03-13T23:01:53Z

pandas/tests/indexes/period/test_period.py

@@ -366,6 +366,17 @@ def test_periods_number_check(self):
        with pytest.raises(ValueError):
            period_range('2011-1-1', '2012-1-1', 'B')

+    def test_start_time(self):


can you add the issue number reference

jreback · 2018-03-13T23:04:03Z

pandas/core/indexes/datetimes.py

                             periods=periods, offset=offset)

        dates = list(xdr)
        # utc = len(dates) > 0 and dates[0].tzinfo is not None
        data = tools.to_datetime(dates)

+        # Add back in the lost nanoseconds
+        if isinstance(start, Timestamp) and isinstance(end, Timestamp):


seems suspect that you need to do this here, what is the reason?

This reason for this was that changing to_timestamp(how='end') to return a time of 23:59:59.99999999 caused problems with asfreq because asfreq creates a new DatetimeIndex which then later calls to_pydatetime() to convert the Timestamps to native Python datetime objects. This leads to a loss of nanosecond precision.

I think there might be an easier way to fix this and I have pushed a new commit to see if it works.

jreback · 2018-03-13T23:05:52Z

pandas/core/indexes/period.py

+            if freq == 'B':
+                adjust = Timedelta(1, 'D') - Timedelta(1, 'ns')
+                return self.to_timestamp(how='start') + adjust
+            else:


hmm the B looks suspect I would think that the bottom expression is correct ``(self + 1)`....

Just speculating: is the issue with e.g. Period('2018-03-16', freq='B').end_time coming back as Timestamp('2018-03-18 23:59:59.99999999')? That's going to need special handling, yah.

The problem is that there are a couple of tests in tests/test_resample.py that use freq='B' that fail. For example, result and expected are different here:

In [2]: rng = pd.period_range('1/1/1990' , '12/31/1995', freq='W-FRI') In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng) In [4]: result = ts.resample('B', convention='end').ffill() In [5]: result.head() 1990-01-05 0.248627 1990-01-08 0.248627 1990-01-09 0.248627 1990-01-10 0.248627 1990-01-11 0.248627 Freq: B, dtype: float64 In [6]: expected = result.to_timestamp('B', how='end') In [7]: expected = expected.asfreq('B', 'ffill').to_period() In [8]: expected.head() Out[8]: 1990-01-08 0.248627 1990-01-09 0.248627 1990-01-10 0.248627 1990-01-11 0.248627 1990-01-12 0.248627 Freq: B, dtype: float64

But when I use:

if freq == 'B': adjust = Timedelta(1, 'D') - Timedelta(1, 'ns') return self.to_timestamp(how='start') + adjust else: adjust = Timedelta(1, 'ns') return (self + 1).to_timestamp(how='start') - adjust

it fixes the problem but I'm not sure why.

ok, this works because we need to rollforward (the + adjust) to make sure we land on a B date.

can you add a nice comment here (about what is going on / why)

actually see my comment below. this might work if you add an offset (a Day), which handles the freq move, rather than a Timedelta.

Do you mean that I should try replacing Timedelta(1, 'D') with Day() to get:

if freq == 'B': adjust = Day() - Timedelta(1, 'ns') return self.to_timestamp(how='start') + adjust else: adjust = Timedelta(1, 'ns') return (self + 1).to_timestamp(how='start') - adjust

If so, it doesn't seem to affect anything (all tests pass using either Timedelta(1, 'D') or Day()).

Another option I considered was to try to condense the if/else in the above code into one case:

adjust = frequencies.to_offset(freq) - Timedelta(1, 'ns') return self.to_timestamp(how='start') - adjust

but it didn't work because some offsets don't support subtraction of a Timedelta, I think.

jreback · 2018-03-13T23:07:04Z

pandas/core/indexes/period.py

-        new_data = period.periodarr_to_dt64arr(new_data._ndarray_values, base)
+        end = how == 'E'
+        if end:
+            indexer = np.where(new_data.notnull())


this is just ~self._isnan

jreback · 2018-03-13T23:08:58Z

pandas/core/indexes/period.py

+            indexer = np.where(new_data.notnull())
+            # move forward one period
+            new_data._values[indexer] += 1
+            ndarray_vals = new_data._ndarray_values


I think this should be pushed to periodarr_to_dt64arr, IOW it should take a how=S/E arg

jreback · 2018-03-13T23:09:31Z

cc @jbrockmendel if you'd have a look esp at the B arithmetic ops

jreback · 2018-03-25T23:03:57Z

doc/source/whatsnew/v0.23.0.txt

@@ -535,6 +535,44 @@ Returning a ``Series`` allows one to control the exact return structure and colu

 .. _whatsnew_0230.api_breaking.build_changes:



the ref above needs to move

jreback · 2018-03-25T23:04:13Z

doc/source/whatsnew/v0.23.0.txt

+
+   In [2]: p = pd.Period('2017-01-01', 'D')
+   In [3]: pi = pd.PeriodIndex([p])
+


can you address this

jreback · 2018-03-25T23:06:08Z

pandas/core/indexes/period.py

+            if freq == 'B':
+                adjust = Timedelta(1, 'D') - Timedelta(1, 'ns')
+                return self.to_timestamp(how='start') + adjust
+            else:


ok, this works because we need to rollforward (the + adjust) to make sure we land on a B date.

can you add a nice comment here (about what is going on / why)

jreback · 2018-03-25T23:07:37Z

pandas/tseries/offsets.py

@@ -1385,7 +1385,7 @@ def _end_apply_index(self, dtindex):
            roll = self.n

        base = (base_period + roll).to_timestamp(how='end')
-        return base + off
+        return base + off + Timedelta(1, 'ns') - Timedelta(1, 'D')


hmm, might be better to actually add an offset here (e.g. a Day) which handles freq moves.

Day() and Timedelta(days=1) should be equivalent here, no? I'd rather use Timedelta, since Day is a less well-known beast.

Are there scenarios where a DST transition is traversed?

I am thinking this will correctly handle the B freqs. its IS possible DST is traversed, but we prob don't have tests for this (though maybe)

jreback · 2018-03-25T23:08:33Z

pandas/core/indexes/period.py

+            if freq == 'B':
+                adjust = Timedelta(1, 'D') - Timedelta(1, 'ns')
+                return self.to_timestamp(how='start') + adjust
+            else:


actually see my comment below. this might work if you add an offset (a Day), which handles the freq move, rather than a Timedelta.

reidy-p · 2018-07-28T15:29:36Z

pandas/core/arrays/datetimes.py

@@ -1235,11 +1235,9 @@ def _generate_regular_range(cls, start, end, periods, freq):
        tz = None
        if isinstance(start, Timestamp):
            tz = start.tz
-            start = start.to_pydatetime()


This conversion to_pydatetime seemed to be causing several tests in tests/test_resample.py to fail because the nanosecond precision was lost. When I removed this it fixed the problem and didn't seem to break any other tests locally.

jreback

lgtm. minor doc comment. @jbrockmendel hows this look?

jreback · 2018-07-29T16:09:23Z

doc/source/whatsnew/v0.24.0.txt

+
+.. ipython:: python
+
+   p = pd.Period('2017-01-01', 'D')


maybe add a comment or 2 on the Current on what is changings (above the line where you are executing the code)

jbrockmendel · 2018-07-30T23:57:50Z

LGTM

jreback · 2018-07-31T13:03:08Z

thanks @reidy-p nice patch!

…ndas-dev#18952)

jreback requested changes Dec 27, 2017

View reviewed changes

gfyoung added API Design Bug Period Period data type labels Dec 27, 2017

reidy-p force-pushed the period_end_time branch from abe63d7 to 3c5ea75 Compare December 29, 2017 22:08

jreback requested changes Dec 30, 2017

View reviewed changes

reidy-p force-pushed the period_end_time branch 2 times, most recently from b5d9981 to ee4d1a9 Compare January 6, 2018 14:52

reidy-p force-pushed the period_end_time branch from ee4d1a9 to 009a38c Compare March 12, 2018 23:15

reidy-p force-pushed the period_end_time branch 2 times, most recently from d33c48d to c399eae Compare March 12, 2018 23:25

reidy-p commented Mar 12, 2018

View reviewed changes

reidy-p force-pushed the period_end_time branch 2 times, most recently from e5836b3 to 0ddcac0 Compare March 13, 2018 22:14

jreback requested changes Mar 13, 2018

View reviewed changes

reidy-p force-pushed the period_end_time branch 10 times, most recently from 59aaccd to f9e9fbd Compare March 17, 2018 22:53

jreback requested changes Mar 25, 2018

View reviewed changes

reidy-p force-pushed the period_end_time branch from f9e9fbd to dddb57c Compare April 3, 2018 23:06

reidy-p force-pushed the period_end_time branch 2 times, most recently from 9899a46 to 7e491a7 Compare July 14, 2018 21:33

reidy-p force-pushed the period_end_time branch from 7e491a7 to ac7c674 Compare July 28, 2018 15:26

reidy-p commented Jul 28, 2018

View reviewed changes

reidy-p force-pushed the period_end_time branch from ac7c674 to 32df752 Compare July 28, 2018 17:31

jreback approved these changes Jul 29, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jul 29, 2018

reidy-p added 9 commits July 30, 2018 20:58

BUG: Adjust time values with Period objects in Series.dt.end_time

4eaa60f

try to solve in to_timestamp and move tests

4557a97

updated whatsnew and simplified to_timestamp(how='end')

5f617df

try to fix failing tests

d53bb17

simplify timestamp precision issue

75aaaeb

whatsnew update

b5b01e7

Fix some imports and move whatsnew

d4cf1ba

remove to_pydatetime

187661e

add small comment to whatsnew

a6716ca

reidy-p force-pushed the period_end_time branch from 32df752 to a6716ca Compare July 30, 2018 20:07

jreback merged commit f76a3f3 into pandas-dev:master Jul 31, 2018

dberenbaum pushed a commit to dberenbaum/pandas that referenced this pull request Aug 3, 2018

BUG: Adjust time values with Period objects in Series.dt.end_time (pa…

3ed25d7

…ndas-dev#18952)

reidy-p deleted the period_end_time branch August 3, 2018 20:14

TomAugspurger mentioned this pull request Aug 22, 2018

Change in behavior of DatetimeIndex + Offset #22465

Closed

reidy-p mentioned this pull request Sep 13, 2018

BUG: Incorrect addition of Week(weekday=6) to DatetimeIndex #22695

Merged

4 tasks

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: Adjust time values with Period objects in Series.dt.end_time (pa…

65bc324

…ndas-dev#18952)


		In [2]: p = pd.Period('2017-01-01', 'D')
		In [3]: pi = pd.PeriodIndex([p])

		@@ -1195,6 +1196,10 @@ cdef class _Period(object):
		freq = self._maybe_convert_freq(freq)
		how = _validate_end_alias(how)

	def _validate_end_alias(how):
	how_dict = {'S': 'S', 'E': 'E',
	'START': 'S', 'FINISH': 'E',
	'BEGIN': 'S', 'END': 'E'}
	how = how_dict.get(str(how).upper())
	if how not in set(['S', 'E']):
	raise ValueError('How must be one of S or E')
	return how

		@@ -535,6 +535,44 @@ Returning a ``Series`` allows one to control the exact return structure and colu

		.. _whatsnew_0230.api_breaking.build_changes:

BUG: Adjust time values with Period objects in Series.dt.end_time #18952

BUG: Adjust time values with Period objects in Series.dt.end_time #18952

Conversation

reidy-p commented Dec 27, 2017 • edited Loading

Choose a reason for hiding this comment

reidy-p Dec 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 29, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reidy-p commented Jan 6, 2018 • edited Loading

pep8speaks commented Mar 12, 2018 • edited Loading

Comment last updated on July 30, 2018 at 20:08 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reidy-p Mar 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jul 30, 2018

jreback commented Jul 31, 2018

reidy-p commented Dec 27, 2017 •

edited

Loading

reidy-p Dec 29, 2017 •

edited

Loading

codecov bot commented Dec 29, 2017 •

edited

Loading

reidy-p commented Jan 6, 2018 •

edited

Loading

pep8speaks commented Mar 12, 2018 •

edited

Loading

reidy-p Mar 17, 2018 •

edited

Loading