Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: isin casting to float64 for unsigned int and list #46693

Merged
merged 5 commits into from
Jun 24, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Apr 8, 2022

The argument is cast to int64 which leads to an upcast later on -> to avoid this we can either use object-dtype for int or implement some casting logic which casts values to unsigned if possible

@phofl phofl added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Regression Functionality that used to work in a prior pandas version labels Apr 8, 2022
@jreback jreback added this to the 1.5 milestone Apr 8, 2022
@jreback
Copy link
Contributor

jreback commented Apr 8, 2022

looks fine. cc @jbrockmendel if any comments.

@@ -446,7 +447,11 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> npt.NDArray[np.bool_]:
)

if not isinstance(values, (ABCIndex, ABCSeries, ABCExtensionArray, np.ndarray)):
values = _ensure_arraylike(list(values))
if not is_signed_integer_dtype(comps):
# GH#46485 Use object to avoid upcast to float64 later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the problem is that this comes back with int64, then np.find_common_type for int64+uint64 is float64, which is lossy for big integers.

we face a similar problem elsewhere, and ideally id like to re-use some of the logic we use for those. the place that comes to mind is in Index._find_common_type_compat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should move the logic from _find_common_type_compat into a helper function we can call in both places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it can be done gracefully thatd be nice. otherwise i guess a TODO to look into sharing sooner or later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing this is not straightforward right now, so would add a todo and try to refactor later

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 4, 2022
@phofl phofl removed the Stale label Jun 15, 2022
@jreback jreback merged commit e5c7543 into pandas-dev:main Jun 24, 2022
@jreback
Copy link
Contributor

jreback commented Jun 24, 2022

thanks @phofl

@phofl phofl deleted the 46485 branch June 25, 2022 14:24
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: isin() give incorrect results for uint64 columns
3 participants