-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Timestamp.replace() behaves naively at DST boundaries (plus bonus segfault!) #7825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @rockg hmm, not sure anything can be done about ATM, no checking is done for replace if it has a tz. It think that is reasonable though. want to submit a pull-request? |
Timestamp.replace()
behaves naively at DST boundaries (plus bonus segfault!)
Yes, but I still have a bunch of other open stuff that I haven't gotten to yet, and my preferred solution is to rip |
why would you want to touch |
In this case, I need "the window from 1:00 to 2:00", which is zero, one or two hours long, depending on the day. Also: In [1]: import pandas as pd
In [2]: list(pd.date_range('2013-11-1', periods=10, freq='24H', tz='America/Chicago'))
Out[2]:
[Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-02 00:00:00-0500', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-03 00:00:00-0500', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-04 00:00:00-0600', tz='America/Chicago', offset='24H'),
# This is (arguably) wrong but should certainly agree with the below
Timestamp('2013-11-05 00:00:00-0600', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-06 00:00:00-0600', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-07 00:00:00-0600', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-08 00:00:00-0600', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-09 00:00:00-0600', tz='America/Chicago', offset='24H'),
Timestamp('2013-11-10 00:00:00-0600', tz='America/Chicago', offset='24H')]
In [3]: out = [pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')]
In [4]: for i in range(9):
...: out.append(out[-1] + pd.offsets.Hour(24))
...:
In [5]: out
Out[5]:
[Timestamp('2013-10-31 23:09:00-0551', tz='America/Chicago'),
# this brokenness comes from datetime.datetime
Timestamp('2013-11-02 00:00:00-0500', tz='America/Chicago'),
Timestamp('2013-11-03 00:00:00-0500', tz='America/Chicago'),
Timestamp('2013-11-03 23:00:00-0600', tz='America/Chicago'),
# this is (arguably) correct but should certainly agree with the above
Timestamp('2013-11-04 23:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-05 23:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-06 23:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-07 23:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-08 23:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-09 23:00:00-0600', tz='America/Chicago')] |
What you are doing is not comparable. I am still confused about the
What exactly is horribly broken? yes datetime doesn't handle dst properly. so what. you are not using it directly anyhow (or are you somewhere that is not apparent)? pandas is very careful internally to deal with DST transitions correctly (of course their are bugs, which get squashed). So what is the bug here? (aside from |
Erm... my hyperbole engine? Which is overheating? It's true that I'm being a little over the top. I think there are two issues here. One is that In [1]: import pandas as pd
In [2]: pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')
Out[2]: Timestamp('2013-10-31 23:09:00-0551', tz='America/Chicago') This is also why So there are three clear bugs here: first, that constructing a The second issue is that in general, 1D != 24H. I know I'm going to have a hard time selling this one, especially since the BDFL has explicitly weighed in with the opposite opinion, but I really think that this is a case of "explicit is better than implicit". Some days just aren't 24 hours long, and users who assume that they are will get burned one way or another. Yes, we can intercept 24H and make it into 1D, but what if that 24 is the result of a computation that could have come out to 20, or 35? What if the user wants 1D1H, as in There are really two notions of datetime arithmetic, one of them well represented by If there's enough support for this I can split it off into a separate issue; otherwise it can just die here. |
I definitely agree that we need to support different notions of day when it concerns tz definitions and addition. This came up on a prior issue...see #5175 for what is being discussed there. |
|
|
ahh it's using the current offset rather than normalizing ok pls create a new issue for that one |
Just to note: One of the reasons I want to get this right is that absolutely everybody else gets it wrong, in some small way or another. Until I discovered |
@ischwabacher make that last a separate issue as well (they might be connected, but easier to deal with this way) |
My current workaround is this: In [1]: import pandas as pd
In [2]: import pytz
In [3]: tz = pytz.timezone('America/Chicago')
In [4]: t = pd.Timestamp('2013-11-3', tz=tz)
In [5]: [tz.normalize(t.replace(hour=n)).replace(hour=n) for n in range(24)]
Out[5]:
[Timestamp('2013-11-03 00:00:00-0500', tz='America/Chicago'),
Timestamp('2013-11-03 01:00:00-0500', tz='America/Chicago'),
Timestamp('2013-11-03 02:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 03:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 04:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 05:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 06:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 07:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 08:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 09:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 10:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 11:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 12:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 13:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 14:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 15:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 16:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 17:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 18:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 19:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 20:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 21:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 22:00:00-0600', tz='America/Chicago'),
Timestamp('2013-11-03 23:00:00-0600', tz='America/Chicago')] |
@ischwabacher what needs to be fixed here again? (as I think some of this was moved to another issue) |
Well, none of the misbehavior in the OP has changed (as of 0.14.1-376-g0b5fa07), so the title issue is still outstanding. #7833 is fixed, so we're not getting crazy offsets when constructing I am finally steeling myself to take on |
@ischwabacher when you have a chance if you'd put up a ToDo list at the top of the PR and we'll make this into a master issue (for the sub-issues) you mentioned |
Yeah, I've been pretty disorganized. I will try to put this together tonight, but I would bet it won't actually come together until some time tomorrow night at the earliest. |
no worries |
if someone wants to evaluate this issue for closure, xref #18618 which we think closes. just want to be sure bases are covered. |
In trying to replicate the definition of
Timestamp.replace
in tslib.pyx, I encountered this further issue:Trying another tack, I found this:
In summary:
datetime
is horribly broken, and the current workaround doesn't actually get all the way around. I know breaking backward compatibility is painful, and throwing the stdlib out the window is even more so, but this situation reminds me of nothing so forcefully as Microsoft Excel's 1900/1904 issue.Version information:
The text was updated successfully, but these errors were encountered: