-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API / BUG: How do we differentiate between -9223372036854775808 and iNaT? #16674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
so we already do this for transforms. https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby.py#L2100. should be easy to extend generally. |
I don't follow you here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby.py#L2102 seems to suggest otherwise. |
@gfyoung the output type of datetime/timedelta ops must be integers (though I suppose and this seems like the case here), that we have a non-datetimelike that returns an integer as well. So may need to sort thru these checks to avoid false positives. |
I think the example in the OP still fails, but only because of floating point error, not iNaT ambiguity. IIRC there was a discussion about supporting int64_t directly in libgroupby.group_add and i think that would solve that particular example. |
I now get the expected result on main; not sure if this needs tests or not. |
From #3707 (at 028188):
The assertion error occurs because during the aggregation,
pandas
checks incython_operation
incore/groupby.py
via_is_cython_func
fromcore/base.py
whether there are any "missing" integer values (assuming the data isinteger
) before and after the aggregation, which are defined asiNaT = -9223372036854775808
. If there are any such values, we automatically cast the data tofloat
.This logic is quite prevalent in the codebase, but it does seem quite fraught with pitfalls. For example, what if the output of a computation got the value
-9223372036854775808
? Also, what if the user intended to use-9223372036854775808
as a legitimate data point?Unlikely, sure. But reasonable, absolutely.
The text was updated successfully, but these errors were encountered: