Skip to content

BUG: groupby with CategoricalIndex doesn't include unobserved categories #49373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 7, 2022

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Oct 28, 2022

@gt-on-1234 - my apologies here; I was meaning to just tackle #49354 but found that my changes leaked over into #49223. Take a look and let me know what you think.

@gt-on-1234
Copy link

@gt-on-1234 - my apologies here; I was meaning to just tackle #49354 but found that my changes leaked over into #49223. Take a look and let me know what you think.

No problem! I think this should pretty much cover the changes in #49318 so will close my PR.

@rhshadrach rhshadrach marked this pull request as draft October 31, 2022 21:52
@rhshadrach rhshadrach marked this pull request as ready for review November 1, 2022 02:10
@@ -542,6 +537,14 @@ def __init__(
# TODO 2022-10-08 we only have one test that gets here and
# values are already in nanoseconds in that case.
self.grouping_vector = Series(self.grouping_vector).to_numpy()
elif is_categorical_dtype(self.grouping_vector):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason this was moved from above?

Copy link
Member Author

@rhshadrach rhshadrach Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this block was in an if...elif...elif... chain where it would be skipped over when the first if was true (namely, when the grouping is specified by a level in the index). Now it's moved out of that chain, so it's always hit when appropriate.

@@ -1235,10 +1289,10 @@ def df_cat(df):
@pytest.mark.parametrize("operation", ["agg", "apply"])
def test_seriesgroupby_observed_true(df_cat, operation):
# GH 24880
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you add the GH reference related to why this test changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment otherwise LGTM

@mroeschke mroeschke added this to the 2.0 milestone Nov 7, 2022
@mroeschke mroeschke merged commit eea9e75 into pandas-dev:main Nov 7, 2022
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the groupby_cat_unobserved branch November 8, 2022 02:32
phofl pushed a commit to phofl/pandas that referenced this pull request Nov 9, 2022
…ies (pandas-dev#49373)

* BUG: groupby with CategoricalIndex doesn't include unobserved categories

* Test fixup

* cleanup

* Remove TODO

* Add GH reference

* Add GH reference
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
…ies (pandas-dev#49373)

* BUG: groupby with CategoricalIndex doesn't include unobserved categories

* Test fixup

* cleanup

* Remove TODO

* Add GH reference

* Add GH reference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Groupby
Projects
None yet
4 participants