-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Adjust time values with Period objects in Series.dt.end_time #18952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/period.py
Outdated
@@ -645,7 +645,8 @@ def start_time(self): | |||
|
|||
@property | |||
def end_time(self): | |||
return self.to_timestamp(how='end') | |||
data = (self + 1).start_time.values.astype(int) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is reaching into the internals too much
is to_timestakp working? (is it tested)?
fix should be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've just pushed a new commit where I tried to make the changes in to_timestamp
rather than in end_time
.
It seems to work as expected but is causing a few tests to fail because the tests expected the time part to be 00:00:00. For example:
In [1]: pi = pd.PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M')
In [2]: exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'])
In [3]: pi.astype('datetime64[ns]', how='end')[0]
Out[3]: 2011-01-31 23:59:59.999999999
In [4]: exp[0]
Out[4]: 2011-01-31 00:00:00
Previously these were equal but now [3] has a time of 23:59:59.999999999 and [4] has a time of 00:00:00.
Period('2016-01-01 00:01:00', freq='M')], | ||
[Period('2016-01-01 00:00:00', freq='S'), | ||
Period('2016-01-01 00:00:01', freq='S')] | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should be with series tests
abe63d7
to
3c5ea75
Compare
Codecov Report
@@ Coverage Diff @@
## master #18952 +/- ##
==========================================
+ Coverage 92.07% 92.07% +<.01%
==========================================
Files 170 170
Lines 50688 50693 +5
==========================================
+ Hits 46671 46676 +5
Misses 4017 4017
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -294,7 +294,7 @@ Conversion | |||
- Bug in :meth:`DatetimeIndex.astype` when converting between timezone aware dtypes, and converting from timezone aware to naive (:issue:`18951`) | |||
- Bug in :class:`FY5253` where ``datetime`` addition and subtraction incremented incorrectly for dates on the year-end but not normalized to midnight (:issue:`18854`) | |||
- Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18849`) | |||
|
|||
- Bug in :func:`Series.dt.end_time` where time values in ``Period`` objects were not adjusted (:issue:`17157`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and PeriodIndex.end_time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need a sub-section showing this I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we're making the change in to_timestamp(how='end')
instead of in dt.end_time
(which then calls to_timestamp(how='end')
) then I will also include to_timestamp(how='end')
in the sub-section
pandas/core/indexes/period.py
Outdated
new_data._values[indexer] += 1 | ||
new_data = period.periodarr_to_dt64arr(new_data._values, base) | ||
# subtract one nanosecond | ||
new_data[indexer] -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just construct the DTI, then - pd.Timedelta(1, 'ns')
note that the display is not correct here
In [23]: p = pd.Period('20170101', 'D')
In [24]: p
Out[24]: Period('2017-01-01', 'D')
In [25]: pi = pd.Index([p])
In [26]: pi
Out[26]: PeriodIndex(['2017-01-01'], dtype='period[D]', freq='D')
In [27]: (pi + 1).to_timestamp(how='start') - pd.Timedelta(1, 'ns')
Out[27]: DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq=None)
In [28]: ((pi + 1).to_timestamp(how='start') - pd.Timedelta(1, 'ns')).values
Out[28]: array(['2017-01-01T23:59:59.999999999'], dtype='datetime64[ns]')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should the DatetimeIndex
show the time value like below when to_timestamp(how='end')
is called?
In [3]: pi.to_timestamp(how='end')
Out[3]: DatetimeIndex(['2017-01-01 23:59:59.999999999'], dtype='datetime64[ns]', freq=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is a separate bug I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I made some updates. There are still a couple of tests failing that I need to spend some more time on. Some tests in
I think the reason the two results above are not equal is that |
b5d9981
to
ee4d1a9
Compare
ee4d1a9
to
009a38c
Compare
Hello @reidy-p! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 30, 2018 at 20:08 Hours UTC |
d33c48d
to
c399eae
Compare
@@ -674,6 +674,15 @@ def to_timestamp(self, freq=None, how='start'): | |||
""" | |||
how = _validate_end_alias(how) | |||
|
|||
end = how == 'E' | |||
if end: | |||
if freq == 'B': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason freq='B'
had to be handled differently because it was causing a lot of resample tests to fail when handled in the same way as all the other freq values. Not sure if this is a bug or if there is a better way of handling this.
e5836b3
to
0ddcac0
Compare
doc/source/whatsnew/v0.23.0.txt
Outdated
.. _whatsnew_0230.api_breaking.end_time: | ||
|
||
Time values in ``dt.end_time`` and ``to_timestamp(how='end')`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be the same length as the text
doc/source/whatsnew/v0.23.0.txt
Outdated
Time values in ``dt.end_time`` and ``to_timestamp(how='end')`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The time values in ``Period`` and ``PeriodIndex`` objects are now adjusted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add :class:
for these
doc/source/whatsnew/v0.23.0.txt
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The time values in ``Period`` and ``PeriodIndex`` objects are now adjusted | ||
appropriately when calling :attr:`Series.dt.end_time`, :attr:`Period.end_time`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you say what was adjusted? IOW in words what a user would want to know about this.
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
In [2]: p = pd.Period('2017-01-01', 'D') | ||
In [3]: pi = pd.PeriodIndex([p]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to show all of these, maybe [6] and [5] are enough (e.g. a scalar and array one), same for below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you address this
@@ -1195,6 +1196,10 @@ cdef class _Period(object): | |||
freq = self._maybe_convert_freq(freq) | |||
how = _validate_end_alias(how) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an assert on how? (e.g. it has to be S/E)? if not can you add one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in _validate_end_alias
I think:
pandas/pandas/_libs/tslibs/period.pyx
Lines 1843 to 1850 in ad50b1d
def _validate_end_alias(how): | |
how_dict = {'S': 'S', 'E': 'E', | |
'START': 'S', 'FINISH': 'E', | |
'BEGIN': 'S', 'END': 'E'} | |
how = how_dict.get(str(how).upper()) | |
if how not in set(['S', 'E']): | |
raise ValueError('How must be one of S or E') | |
return how |
@@ -366,6 +366,17 @@ def test_periods_number_check(self): | |||
with pytest.raises(ValueError): | |||
period_range('2011-1-1', '2012-1-1', 'B') | |||
|
|||
def test_start_time(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number reference
pandas/core/indexes/datetimes.py
Outdated
periods=periods, offset=offset) | ||
|
||
dates = list(xdr) | ||
# utc = len(dates) > 0 and dates[0].tzinfo is not None | ||
data = tools.to_datetime(dates) | ||
|
||
# Add back in the lost nanoseconds | ||
if isinstance(start, Timestamp) and isinstance(end, Timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems suspect that you need to do this here, what is the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reason for this was that changing to_timestamp(how='end')
to return a time of 23:59:59.99999999 caused problems with asfreq
because asfreq
creates a new DatetimeIndex
which then later calls to_pydatetime()
to convert the Timestamps to native Python datetime objects. This leads to a loss of nanosecond precision.
I think there might be an easier way to fix this and I have pushed a new commit to see if it works.
pandas/core/indexes/period.py
Outdated
if freq == 'B': | ||
adjust = Timedelta(1, 'D') - Timedelta(1, 'ns') | ||
return self.to_timestamp(how='start') + adjust | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm the B looks suspect I would think that the bottom expression is correct ``(self + 1)`....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just speculating: is the issue with e.g. Period('2018-03-16', freq='B').end_time
coming back as Timestamp('2018-03-18 23:59:59.99999999')
? That's going to need special handling, yah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that there are a couple of tests in tests/test_resample.py
that use freq='B'
that fail. For example, result and expected are different here:
In [2]: rng = pd.period_range('1/1/1990' , '12/31/1995', freq='W-FRI')
In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [4]: result = ts.resample('B', convention='end').ffill()
In [5]: result.head()
1990-01-05 0.248627
1990-01-08 0.248627
1990-01-09 0.248627
1990-01-10 0.248627
1990-01-11 0.248627
Freq: B, dtype: float64
In [6]: expected = result.to_timestamp('B', how='end')
In [7]: expected = expected.asfreq('B', 'ffill').to_period()
In [8]: expected.head()
Out[8]:
1990-01-08 0.248627
1990-01-09 0.248627
1990-01-10 0.248627
1990-01-11 0.248627
1990-01-12 0.248627
Freq: B, dtype: float64
But when I use:
if freq == 'B':
adjust = Timedelta(1, 'D') - Timedelta(1, 'ns')
return self.to_timestamp(how='start') + adjust
else:
adjust = Timedelta(1, 'ns')
return (self + 1).to_timestamp(how='start') - adjust
it fixes the problem but I'm not sure why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this works because we need to rollforward (the + adjust) to make sure we land on a B date.
can you add a nice comment here (about what is going on / why)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually see my comment below. this might work if you add an offset (a Day), which handles the freq move, rather than a Timedelta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that I should try replacing Timedelta(1, 'D')
with Day()
to get:
if freq == 'B':
adjust = Day() - Timedelta(1, 'ns')
return self.to_timestamp(how='start') + adjust
else:
adjust = Timedelta(1, 'ns')
return (self + 1).to_timestamp(how='start') - adjust
If so, it doesn't seem to affect anything (all tests pass using either Timedelta(1, 'D')
or Day()
).
Another option I considered was to try to condense the if/else in the above code into one case:
adjust = frequencies.to_offset(freq) - Timedelta(1, 'ns')
return self.to_timestamp(how='start') - adjust
but it didn't work because some offsets don't support subtraction of a Timedelta, I think.
pandas/core/indexes/period.py
Outdated
new_data = period.periodarr_to_dt64arr(new_data._ndarray_values, base) | ||
end = how == 'E' | ||
if end: | ||
indexer = np.where(new_data.notnull()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just ~self._isnan
pandas/core/indexes/period.py
Outdated
indexer = np.where(new_data.notnull()) | ||
# move forward one period | ||
new_data._values[indexer] += 1 | ||
ndarray_vals = new_data._ndarray_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be pushed to periodarr_to_dt64arr, IOW it should take a how=S/E arg
cc @jbrockmendel if you'd have a look esp at the B arithmetic ops |
59aaccd
to
f9e9fbd
Compare
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -535,6 +535,44 @@ Returning a ``Series`` allows one to control the exact return structure and colu | |||
|
|||
.. _whatsnew_0230.api_breaking.build_changes: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ref above needs to move
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
In [2]: p = pd.Period('2017-01-01', 'D') | ||
In [3]: pi = pd.PeriodIndex([p]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you address this
pandas/core/indexes/period.py
Outdated
if freq == 'B': | ||
adjust = Timedelta(1, 'D') - Timedelta(1, 'ns') | ||
return self.to_timestamp(how='start') + adjust | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this works because we need to rollforward (the + adjust) to make sure we land on a B date.
can you add a nice comment here (about what is going on / why)
@@ -1385,7 +1385,7 @@ def _end_apply_index(self, dtindex): | |||
roll = self.n | |||
|
|||
base = (base_period + roll).to_timestamp(how='end') | |||
return base + off | |||
return base + off + Timedelta(1, 'ns') - Timedelta(1, 'D') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, might be better to actually add an offset here (e.g. a Day) which handles freq moves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Day() and Timedelta(days=1) should be equivalent here, no? I'd rather use Timedelta, since Day is a less well-known beast.
Are there scenarios where a DST transition is traversed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking this will correctly handle the B freqs. its IS possible DST is traversed, but we prob don't have tests for this (though maybe)
pandas/core/indexes/period.py
Outdated
if freq == 'B': | ||
adjust = Timedelta(1, 'D') - Timedelta(1, 'ns') | ||
return self.to_timestamp(how='start') + adjust | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually see my comment below. this might work if you add an offset (a Day), which handles the freq move, rather than a Timedelta.
9899a46
to
7e491a7
Compare
@@ -1235,11 +1235,9 @@ def _generate_regular_range(cls, start, end, periods, freq): | |||
tz = None | |||
if isinstance(start, Timestamp): | |||
tz = start.tz | |||
start = start.to_pydatetime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion to_pydatetime
seemed to be causing several tests in tests/test_resample.py
to fail because the nanosecond precision was lost. When I removed this it fixed the problem and didn't seem to break any other tests locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor doc comment. @jbrockmendel hows this look?
doc/source/whatsnew/v0.24.0.txt
Outdated
|
||
.. ipython:: python | ||
|
||
p = pd.Period('2017-01-01', 'D') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment or 2 on the Current on what is changings (above the line where you are executing the code)
LGTM |
thanks @reidy-p nice patch! |
git diff upstream/master -u -- "*.py" | flake8 --diff