-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
to_datetime() throws ValueError: Cannot pass a tz argument when parsing strings with timezone information. #32792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you provide a minimal, copy pastable example in this issue: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
What do you mean? There's a repl link on the issue. |
pls copy paste code to the top of the issue as instructed in the template |
There you go. |
Distilled from the link / larger example: # I know the format, I want to use it so that Pandas to_datetime() runs faster.
DATETIME_FORMAT = '%m/%d/%Y %H:%M:%S.%f%z'
pd.to_datetime(["10/11/2018 00:00:00.045-07:00", "10/11/2018 01:00:00.045-07:00"],
format=DATETIME_FORMAT, utc=True) @eparizzi although the repl site looks really nice (certainly for a bit more complex example that might require an additional dependency or data), we still prefer a short copy-pastable example here when possible. Like the example I put above. That makes it easier to deal with a lot of issue reports. |
Understood. I wanted to give more detail because on the issue I linked it seems that this is an expected behavior, not a bug. But I still think we should have a way to parse timezone strings to UTC by passing the format which works way faster than letting pandas infer it. |
I guess that's a fair case to allow The documentation will need to be clearly stating that the returning timestamps will be localized to the timezone parsed from |
I thought that was the case already. At least, that's what I understand from the current docs for the utc argument.
|
I think pandas should support passing
%z
in the format but alsoutc=True
. In my opinion, one thing is the format, which tells pandas how to parse the datetime string. The other argument is just telling to return the dates in UTC, no matter which timezone they were in the beginning.Here's a repl that shows the issue: https://repl.it/@eparizzi/Pandas-todatetime-in-UTC-with-format
If you replace that simple CSV with some big 50K row time-series CSV, the call to
to_datetime
without the format takes more than 20 seconds. On the contrary, passing the format and withoututc=True
takes less than 2 seconds. Unfortunately, this doesn't seem to work properly when there are multiple timezones in the column. It simply can't set a proper dtype in this case.So, why can't we have a way to specify the format including timezone but also specify that we want everything in datetime64(UTC)?
I've already gone over this issue: #25571 but I still think this deserves a discussion.
The text was updated successfully, but these errors were encountered: