-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
handle DST appropriately in Timestamp.replace #18618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -248,4 +248,4 @@ Other | |||
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`) | |||
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`) | |||
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`) | |||
- | |||
- :func:`Timestamp.replace` will now handle Daylight Savings transitions gracefully (:issue:`18319`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "will no longer crash when handling DST transitions" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crash is a not an appropriate term for user notes.
Codecov Report
@@ Coverage Diff @@
## master #18618 +/- ##
==========================================
- Coverage 91.46% 91.45% -0.02%
==========================================
Files 157 157
Lines 51449 51449
==========================================
- Hits 47060 47051 -9
- Misses 4389 4398 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18618 +/- ##
==========================================
- Coverage 91.57% 91.51% -0.06%
==========================================
Files 150 148 -2
Lines 48937 48804 -133
==========================================
- Hits 44815 44664 -151
- Misses 4122 4140 +18
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -248,4 +248,4 @@ Other | |||
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`) | |||
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`) | |||
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`) | |||
- | |||
- :func:`Timestamp.replace` will now handle Daylight Savings transitions gracefully (:issue:`18319`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crash is a not an appropriate term for user notes.
@@ -1136,6 +1136,26 @@ def test_timestamp(self): | |||
dt = ts.to_pydatetime() | |||
assert dt.timestamp() == ts.timestamp() | |||
|
|||
def test_replace(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't be the right place, we have lots of replace tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in test_timestamp... I'll take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like its in test_timezones.
pandas/_libs/tslibs/timestamps.pyx
Outdated
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min, | ||
dts.sec, dts.us, tzinfo=_tzinfo) | ||
if _tzinfo is not None and treat_tz_as_pytz(_tzinfo): | ||
# be careful about DST transition, #18319 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an informative comments.
why woudn't you always localize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tzinfo.localize
changing the tzinfo
object is specific to pytz isn't it?
lots of tests in #7825, can you add tests / narrow down what is and is not fixed. |
@gfyoung as jreback mentioned, that issue covered a lot of ground. The crash there can be ignored, not a pandas issue. The OP there gave an example of Timestamp.replace that was incorrect, but did not give the expected output. |
@ischwabacher can you have a look |
@@ -1229,6 +1229,26 @@ def f(): | |||
dt = Timestamp('2013-11-03 01:59:59.999999-0400', tz='US/Eastern') | |||
assert dt.tz_localize(None) == dt.replace(tzinfo=None) | |||
|
|||
def test_replace_across_dst(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you tests with dateutil as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is pretty specific to pytz API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is specific to the pytz
API because the problem is specific to the pytz
API, so you can't write tests that currently fail with dateutil
, but you can, in fact, write this test to be agnostic as to whether the test takes a pytz
or dateutil
zone.
Rather than using localize
and normalize
, express the "expected" datetimes as UTC datetimes and use .astimezone
. pytz
and dateutil
support astimezone
(and, in fact, pytz.normalize
essentially just uses .astimezone
under the hood). Since pandas
provides their own API layer on top of datetime
, it is also agnostic to the timezone provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using localize and normalize, express the "expected" datetimes as UTC datetimes and use .astimezone
Is this a suggestion for this test or more generally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a suggestion for this test or more generally?
In this test at least, since with tests you can always engineer your datetime literals such that you never have to use anything except astimezone
.
It's also just a useful piece of information to have, that astimezone
is one of the few timezone functions where pytz
behaves more or less nicely. You only need normalize
semantics with pytz
for things like replace
or "calendar" arithmetic, where you want to change the naive portion of the datetime to a specific value and then change the time zone offset to match. If you want "absolute" arithmetic (where you want to go forward a certain number of hours or seconds or something), then the_operation(dt.astimezone(UTC)).astimezone(dt.tzinfo)
will always give the right answer.
In this case you know what the correct answer should be, in absolute time, so you can just declare your initial variable as the correct answer in UTC
and convert that to the time zone you care about. If you do it generically like that, you are insulated from any weird quirks of pytz
's interface, and you get dateutil
support for free (so you can parametrize this test using pytz
and dateutil
zones).
Of course, this method will fail if you try to construct an imaginary time, since there is no mapping between UTC and imaginary times.
@@ -1229,6 +1229,26 @@ def f(): | |||
dt = Timestamp('2013-11-03 01:59:59.999999-0400', tz='US/Eastern') | |||
assert dt.tz_localize(None) == dt.replace(tzinfo=None) | |||
|
|||
def test_replace_across_dst(self): | |||
# GH#18319 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide a 1-liner about what this is testing (I know the title is informative, but a little color)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
needs a rebase. also pls add all of the examples from #7825 ; if something doesn't work, then you can open new issue for that part. |
Will do.
#7825 is all over the place (I probably shouldn't have referenced it above). The example in the OP I think is handled, but the poster didn't specify the expected behavior. I think the thing to do is keep that open and ask the OP there to verify if the new behavior closes that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to include some tests on the original issue #7825 if they are not working xfail them.
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -320,5 +320,10 @@ Categorical | |||
Other | |||
^^^^^ | |||
|
|||
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`) | |||
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have rebase issues here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix. Not a big deal, but for changes limited to whatsnew is skipping the CI a viable option? The CI turnaround time is close to a full day and that file gets touched a lot.
I'll take another look, but based on the last two looks I concluded that this is not actionable until the author there provides the expected output for the example cases. Simpler to treat #18319 as the "original issue". |
There's a follow-up to this waiting in the wings that fixes a couple of related bugs. |
I think this needs at least two additional tests - 1. a test for when I think that the way it's done right now, |
Good idea, will do. |
I bet this will fix #18785 as well. |
I added an example for #18785 |
I've added these tests and you're right, these do different things. 1AM on 2014-11-05 is ambiguous in US/Eastern:
So the pytz version resolves to the later of the two possibilities while the dateutil version resolves to the earlier. It isn't clear to me that there is an unambiguous "right" thing to do in this case, so as far as pandas is concerned just deferring to pytz/dateutil may be the way to go. Thoughts? As for imaginary times, both pytz and dateutil versions (both in the PR and in master) fail to raise when |
There is indeed an unambiguous right way to do things. The Python documentation before 3.6 suggested that ambiguous times should always resolve to the standard time (the later time). As of Python 3.6, unfortunately this is inverted because the default value of
No time zone hooks are called on To me, it seems like this PR is mostly about making |
|
||
# Preliminary sanity-check | ||
assert ts_aware == ts_aware.tzinfo.normalize(ts_aware) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to parametrize this across dateutil & pytz, so you need to move this to TestTimeZoneSupportPytz
and use self.tz(...)
did you cover @pganssle requests for tests? |
@jreback While #18595 and this issue share the same root cause - namely that To put it another way, if you think of pytz as having a "parent" zone (the one you get by calling |
This is a sort of "out there" idea, but there is one (possibly not particularly performant) thing you could do that would unify the interface between from datetime import tzinfo
from datetime import datetime, timedelta
import functools
import pytz
def _localize(f):
@functools.wraps(f)
def inner_func(self, dt):
return f(self, self._zone.localize(dt.replace(tzinfo=None)))
return inner_func
class pytz_zonewrapper(tzinfo):
def __init__(self, pytz_zone):
self._zone = pytz.timezone(str(pytz_zone))
@_localize
def utcoffset(self, dt):
return dt.utcoffset()
@_localize
def tzname(self, dt):
return dt.tzname()
@_localize
def dst(self, dt):
return dt.dst() I would say for now go with @jbrockmendel's approach, but I would think that it might be nice to remove all the pytz-specific code cluttering up time zone handling. Another option might be to just internally start using |
thanks, I left #7825 open until we resolve whether this closes it. |
Both assertions fail under master, pass under this PR.
The whatsnew entry is not super-informative. Any ideas for a one-sentence description of this fix?
xref #7825 (I think)
git diff upstream/master -u -- "*.py" | flake8 --diff