Skip to content

BUG: Inconsistency with grouping columns in agg #50383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Dec 21, 2022 · 3 comments · Fixed by #55193
Closed

BUG: Inconsistency with grouping columns in agg #50383

rhshadrach opened this issue Dec 21, 2022 · 3 comments · Fixed by #55193
Labels
Bug good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@rhshadrach
Copy link
Member

Related to #46944

Users can currently request the grouping column be part of the computation for various ops by including them as part of a __getitem__. But agg will still exclude these columns.

df = pd.DataFrame({'a': [1, 1, 2], 'b': 3, 'c': 4, 'd': 5})
gb = df.groupby(['a', 'b'])[['a', 'c']]

result = gb.sum()
print(result)
#      a  c
# a b      
# 1 3  2  8
# 2 3  2  4

result2 = gb.agg(lambda x: x.sum())
print(result2)
#      c
# a b   
# 1 3  8
# 2 3  4

I would expect __getitem__ to only subset columns for groupby rather than being able to add additional (grouping) columns.

@ihsansecer
Copy link
Contributor

Working fine on main now

>>> result2 = gb.agg(lambda x: x.sum())
>>> print(result2)
     a  c
a b      
1 3  2  8
2 3  2  4

@rhshadrach
Copy link
Member Author

Thanks @ihsansecer! Would you have any interest in adding a test?

I would expect __getitem__ to only subset columns for groupby rather than being able to add additional (grouping) columns.

I've come around to thinking of this as a feature. While groupby typically excludes the grouping column, if there is a use case for including them I suppose it can be good to have.

@ihsansecer
Copy link
Contributor

I've come around to thinking of this as a feature. While groupby typically excludes the grouping column, if there is a use case for including them I suppose it can be good to have.

Yeah I would expect the same and I wasn't aware of this behavior before. Added a test for this

@rhshadrach rhshadrach added this to the 2.2 milestone Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants