Skip to content

BUG: pd.isnull treats list and tuple input differently #52283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
jrbourbeau opened this issue Mar 29, 2023 · 6 comments
Open
2 of 3 tasks

BUG: pd.isnull treats list and tuple input differently #52283

jrbourbeau opened this issue Mar 29, 2023 · 6 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@jrbourbeau
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print(f"{pd.isnull([1, 2, 3]) = }")
print(f"{pd.isnull((1, 2, 3)) = }")

Issue Description

It looks like pd.isnull is treating list as array-like and tuple as a scalar

Expected Behavior

I'd expect lists and tuples to be treated similarly by pd.isnull. Similar to other parts of the API like pd.Series

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 2e218d10984e9919f0296931d92ea851c6a6faf5
python           : 3.9.15.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 22.3.0
Version          : Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.5.3
numpy            : 1.24.0
pytz             : 2022.6
dateutil         : 2.8.2
setuptools       : 59.8.0
pip              : 22.3.1
Cython           : None
pytest           : 7.2.0
hypothesis       : None
sphinx           : 4.5.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.7.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : None
brotli           :
fastparquet      : 2023.2.0
fsspec           : 2022.11.0
gcsfs            : None
matplotlib       : 3.6.2
numba            : None
numexpr          : 2.8.3
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 11.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : 2022.11.0
scipy            : 1.9.3
snappy           :
sqlalchemy       : 1.4.46
tables           : 3.7.0
tabulate         : None
xarray           : 2022.9.0
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None
@jrbourbeau jrbourbeau added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 29, 2023
@DeaMariaLeon DeaMariaLeon added Usage Question Closing Candidate May be closeable, needs more eyeballs Bug and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 30, 2023
@DeaMariaLeon
Copy link
Member

DeaMariaLeon commented Mar 30, 2023

This function returns a boolean or array-like of bool. I'll keep the "bug" label just in case, but I don't think it is one.

https://pandas.pydata.org/docs/reference/api/pandas.isnull.html

@jrbourbeau
Copy link
Contributor Author

Thanks @DeaMariaLeon. The thing that seems off to me is pd.isnull is treating lists as array-like (returning a array-like of bools) and tuples as scalar (returning a bool)

In [1]: import pandas as pd

In [2]: pd.isnull([1, pd.NA, 3])
Out[2]: array([False,  True, False])

In [3]: pd.isnull((1, pd.NA, 3))
Out[3]: False

My expectation is that both lists and tuples should be treated as array-like. Though feel free to let me know if that expectation is incorrect

@DeaMariaLeon
Copy link
Member

Oh, I see! Thank you for opening an issue. :)

@DeaMariaLeon DeaMariaLeon added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Usage Question Closing Candidate May be closeable, needs more eyeballs labels Mar 30, 2023
@phofl
Copy link
Member

phofl commented Mar 30, 2023

This is an edge case I think.

You can end up with tuples from a MultiIndex for example. In this scenario we want to treat the tuple as a single element, e.g.

df.drop(columns=(1, 2))

treats the tuple as a single element. I think this is similar here although it does not really look intuitive to me either.

@rhshadrach
Copy link
Member

Interestingly in a list, tuples are treated as array-like:

obj = [(1.0, 2.0), (1.0, np.nan), (np.nan, 2.0), (np.nan, np.nan)]
print(pd.isnull(obj))
# [[False False]
#  [False  True]
#  [ True False]
#  [ True  True]]

@jorisvandenbossche
Copy link
Member

treating lists as array-like (returning a array-like of bools) and tuples as scalar (returning a bool)

As far as I remember, in the past we made this distinction (in certain places) because tuples can be labels, as Patrick mentioned.

But it's indeed a tricky situation, with easy confusion and corner cases (a quick search for "tuple list label" gives quite some related issues). For example #43978 for the drop example.

Another example in indexing where the two are distinguished and have different behaviour:

>>> s = pd.Series(range(6), index=pd.MultiIndex.from_product([[1, 2, 3], [1, 2]]))
>>> s.loc[(1, 2)]  # tuple is a single label
1
>>> s.loc[[1, 2]]  # list is an indexer (in this case for the first level of the MultiIndex)
1  1    0
   2    1
2  1    2
   2    3
dtype: int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants