Description
Code Sample, a copy-pastable example if possible
import numpy as np
ONEMIL = 1_000_000
big_ser = pd.Series([np.nan] + [1] * (ONEMIL))
big_ser.isin([np.nan]).head()
0 False
1 False
2 False
3 False
4 False
dtype: bool
big_ser.isin([np.nan]).value_counts()
False 1000001
dtype: int64
small_ser = pd.Series([np.nan] + [1] * (ONEMIL - 1))
small_ser.isin([np.nan]).head()
0 True
1 False
2 False
3 False
4 False
dtype: bool
small_ser.isin([np.nan]).value_counts()
False 999999
True 1
dtype: int64
Problem description
The behaviour is not consistent. The example above should be descriptive enough.
Expected Output
Accurate result of pd.Series.isin should not depend on series length.
Both cases should accurately count 1 row containing np.Nan
Output of pd.show_versions()
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.20.8-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.utf-8
LANG: en_US.utf-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: 1.6.2
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: 0.7.0