Skip to content

isin() returns different results than eq() when mixing dtypes of comparators #16938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mansenfranzen opened this issue Jul 15, 2017 · 1 comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions isin isin method

Comments

@mansenfranzen
Copy link

Expected correct behavior for the same compartor dtypes

s = pd.Series([1.2, 2.3])
s.eq(1.2) == s.isin([1.2]) # True, True

s32 = pd.Series([1.2, 2.3], dtype="float32")
s32.eq(np.float32(1.2)) == s32.isin([np.float32(1.2)]) # True, True

Non expected behavior for mixed comparator dtypes

s32.eq(1.2) == s32.isin([1.2]) # False, True

# in detail
s32.eq(1.2) # True, False
s32.isin([1.2]) # False, False

In summary, eq() and isin() return different results when mixing comparator dtypes.

Problem description

Both methods eq() and isin() should return the same result. Here is the related SO article.

This issue might originate in numpy and perhaps is not directly pandas related (see here for more). Scalar comparison (equivalent to eq()) and array comparison (equivalent to isin()) comparison yield different results for mixed comparator dtypes in numpy, too.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.8.0-58-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions labels Jul 15, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 15, 2017
@jreback
Copy link
Contributor

jreback commented Jul 15, 2017

no this has to do with the upcasting rules. numpy is not used here except in a small evaluation case.
upcasting of the mixed operands is actually somewhat non-trival, see the code https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L372

welcome for you to add your test case and debug. there are quite a few tests around this so this might be tricky to get right.

@jbrockmendel jbrockmendel added the isin isin method label Oct 30, 2020
@mroeschke mroeschke removed the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions isin isin method
Projects
None yet
Development

No branches or pull requests

5 participants