-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
tricky timestamp conversion #25571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It doesn't look like your first usage of FYI should need to pass |
@WillAyd thanks! do you actually mean I should NOT use |
Documentation points to As mentioned in any case a minimally reproducible example would be very helpful |
@WillAyd yes but the issue is that obviously I cannot share the large data. Is there any pandas function I can try in order to understand whats going on? Can we identify the possibly bogus timestamps? thanks!! |
Tough to say. Maybe don't coerce errors and see where it raises - could indicate where the issue arises |
thanks guys. trying that shortly and reporting to the base ASAP :) |
@mroeschke the problem is that i dont see an easy fix to #25143. maybe just getting rid of |
I imagine the error handling around this argument is not entirely robust. |
OK @WillAyd @mroeschke this is getting even weirder. I reloaded my data, this time cleaning the (character) timestamps (I noticed some extra space in them)
and I was able to run
and using This is very annoying... is there a workaround to make sure the processing is correct? At the end of the day I think this specific ISO format seems to be causing a few errors at the moment... |
the puzzling part is that it is not raising any errors, yet the output is not datetime. |
So I think object dtype might actually be the expected behavior here if How many distinct tz offsets are there in |
@mroeschke HA! that is an interesting idea. Normally I should only have one tz offset but with 200m rows who knows. how can I tabulate the offsets to check that you theory is correct? |
Might be a little slow but |
hum... i get
does that mean there is just one timestamp (as expected)? |
This means that the object dtype is expected. Since your string data contained more than one timezone offset, it's not possible to cast this data to one |
@mroeschke we re close but its not over yet... looking at the link you sent I was hopeful that Below I reloaded the data so they are strings again.
Thanks! |
When I addressed this timezone parsing my rational was if For your use case, if you leave out the format argument and keep |
I think the behavior in this issue is expected and is more of a usage question. Closing. |
Even though the issue has been closed, just want to point out that if tz-aware format bothers you so much and gives you the above error, simply convert it to naive format by using |
Hello there, its me the bug hunter again :)
I have this massive 200 million rows dataset, and I encountered some very annoying behavior. I wonder if this is a bug.
I load my csv using
and the
datetime
column really look like regular timestamps (tz aware)Now, I take extra care in converting these string into proper timestamps:
mylog['mydatetime'] = pd.to_datetime(mylog['mydatetime'] ,errors = 'coerce', format = '%Y-%m-%dT%H:%M:%S.%f%z', infer_datetime_format = True, cache = True)
That takes a looong time to process, but seems OK. The output is
What is puzzling is that so far I thought I had full control of my
dtypes
. However, running the simpleThe only way I was able to go past this error was by running
mylog['myday'] = pd.to_datetime(mylog['mydatetime'].apply(lambda x: x.date()))
Is this a bug? Before upgrading to
24.1
I was not getting thetz
error above. What do you think? I cant share the data but I am happy to try some things to help you out!Thanks!
The text was updated successfully, but these errors were encountered: