-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: generated timestamps deviate from expected values #48255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for your report. Could you run a git bisect to narrow it down further? Small note: I am not sure that this is on our side. We are passing So iiuc this is already wrong? |
I ran I haven't been able to dig deeply enough into the code to identify exactly where things are going wrong, so any help would be appreciated. |
Sorry did not see initially that the new commit is in June too Could you please answer my other question? This looks to me as the issue is not on our side |
@phofl @chalmerlowe I'm able to reproduce this without the
I believe the problem might lie here: pandas/pandas/_libs/tslibs/conversion.pyx Line 391 in e94faa2
As the call to Edit: I see that |
Thank you very much. cc @jbrockmendel any idea here? |
From a quick look, it appears that the nanoseconds is derived from |
@mroeschke where are we doing this multiplication? |
IIUC here pandas/pandas/_libs/tslibs/conversion.pyx Line 384 in c08c925
|
Some other possibly problematic places where we might be overflowing: pandas/pandas/_libs/tslibs/timestamps.pyx Line 2291 in e94faa2
pandas/pandas/_libs/tslibs/timestamps.pyx Line 2317 in e94faa2
With this diff, I get the above example to work correctly again, but it feels like a band-aid fix:
Would be more ideal if we use a |
would it be easier to do |
Tried this locally, but it didn't fix the issue. I think this is because |
Playing around with the fix in #48255 (comment), I don't think this is a great solution either in the long run as spilling some of the nanoseconds into the microseconds could introduce further bugs like adding too many additional microseconds beyond the allowed cap (1000000). I think to avoid this all together, no more than 999 nanoseconds should be allowed in the Timedelta constructor to mirror the behavior of other components in the datetime constructor. #48538 |
A little unfortunate, but easy enough to workaround in googleapis/python-db-dtypes-pandas#148 If we can make this an error in 1.5 instead of an overflow, that would save a lot of debugging time for other folks who might hit this issue. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug DOES NOT exist on the main branch of pandas (1.4.3 at the time of creating this Issue).
Reproducible Example
Issue Description
Formatted time
strings
are incorrectly converted todatetime.time
objects.When running the prerelease tests for
python-db-dtypes-pandas
and checking for prerelease issues that might be associated with thepandas dev
branches, we found identified that severalpython-db-dtypes-pandas
tests failed depending on which nightlypandas
branch we use.In particular between these two commits:
Not sure yet if this is due to something that derives from the
python-db-dtypes-pandas
lib OR frompandas
OR from one ofpandas
' dependencies.NOTE: the reproducible example provided is just one of many tests in the test suite that fail with a wide variety of different times, etc.
Expected Behavior
The expected result of the reproducible example should be:
assert datetime.time(23, 59, 59, 999999) == datetime.time(23, 59, 59, 999999)
Installed Versions
For reference here are the installed libraries when we ran these tests. When running the failing tests, the only package that changed was the version of pandas. All other packages were the same:
The text was updated successfully, but these errors were encountered: