-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/DEPR: int downcasting in DataFrame.where #44597
Comments
I am +1 on deprecating the "try casting back to original dtype" for this specific case. We actually had a keyword for that before ( One note though: if we don't "cast back" for for columns that were not affected, we should IMO also not preserve the original dtype if none of the columns were affected (so if the mask is fully True). |
if none of the columns were affected, then i'd expect this to be a no-op. You're suggesting we would do some casting instead? An example might be helpful. |
Yes, eg your example but with the mask completely True:
So this is indeed a no-op currently, preserving the dtype. But if we want behaviour that doesn't depend on the exact content of If we don't want to do that because of "let's not cast if we don't have to (i.e. in case of a no-op)", then I think we should keep the "downcast" of the original example here, as that is not actually a downcast, but undoing the upcast, so a preservation of the original dtype for a no-op (when considering just that column). |
For reference, numpy always gives the same dtype as result regardless of the mask being fully True/False or not:
|
makes sense, thanks. will need to look into how this would affect putmask; id like to keep the behaviors symmetric where possible. |
Block.where
has special downcasting logic that splits blocks differently from any other Block methods. I would like to deprecate and eventually remove this bespoke logic.The relevant logic is only reached AFAICT when we have integer dtype (non-int64) and an integer
other
too big for this dtype, AND the passedcond
has all-True
columns.(Identifying the affected behavior is difficult in part because it relies on
can_hold_element
incorrectly returningTrue
in these cases)The simplest thing to do would be to not do any downcasting in these cases, in which case we would end up with all-int32. The next simplest would be to downcast column-wise, which would give the same end result but with less consolidation.
We do not have any test cases that fail if I disable this downcasting (after I fix a problem with an expressions.where call that the downcasting somehow makes irrelevant). This makes me think the current behavior is not intentional, or at least not a priority.
Any objection to deprecating the integer downcasting entirely?
The text was updated successfully, but these errors were encountered: