Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Groupby any/all raises TypeError for pd.NA #37501

Closed
3 tasks done
phofl opened this issue Oct 29, 2020 · 4 comments · Fixed by #42085
Closed
3 tasks done

BUG: Groupby any/all raises TypeError for pd.NA #37501

phofl opened this issue Oct 29, 2020 · 4 comments · Fixed by #42085
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@phofl
Copy link
Member

phofl commented Oct 29, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"a": [1, 1, 2, 2, 3, 3, 4], "b": [1, pd.NA, 2, 3, 4, 5, 6]})
grouped = df.groupby("a")
print(grouped.all())
print(grouped.any())

Problem description

Both raise a TypeError

Traceback:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 10, in <module>
    print(grouped.all())
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1424, in all
    return self._bool_agg("all", skipna)
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1375, in _bool_agg
    return self._get_cythonized_result(
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 2631, in _get_cythonized_result
    raise TypeError(error_msg)
TypeError: boolean value of NA is ambiguous

Maybe using mask like in print(df.all(axis=1))?

Expected Output

Should work like with np.nan and return

      b
a      
1  True
2  True
3  True
4  True
      b
a      
1  True
2  True
3  True
4  True

Output of pd.show_versions()

master
@phofl phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member Groupby labels Oct 29, 2020
@phofl
Copy link
Member Author

phofl commented Oct 29, 2020

Similar:

df = pd.DataFrame({"a": [1, 1, 2, 2, 3, 3, 4], "b": [1, pd.NA, 2, 3, 4, 5, 6]})
grouped = df.groupby("a")
print(grouped.mean())

raises a DataError

Traceback:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 10, in <module>
    print(grouped.mean())
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1490, in mean
    return self._cython_agg_general(
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/generic.py", line 1021, in _cython_agg_general
    agg_mgr = self._cython_agg_blocks(
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/generic.py", line 1123, in _cython_agg_blocks
    raise DataError("No numeric types to aggregate")
pandas.core.base.DataError: No numeric types to aggregate

while

print(grouped.sum()) works.

@arw2019
Copy link
Member

arw2019 commented Oct 29, 2020

I think this falls under the umbrella of #37494

@arw2019 arw2019 added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 29, 2020
@arw2019
Copy link
Member

arw2019 commented Oct 30, 2020

I think this falls under the umbrella of #37494

On a second look I think this is an independent issue

@GYHHAHA
Copy link
Contributor

GYHHAHA commented Nov 4, 2020

The groupby.all or groupby.any problem comes from bool(pd.NA) is ambiguous.

vals = np.array([bool(x) for x in vals])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants