Skip to content

BUG: Fix to GH34422 SeriesGroupBy works only with 'func' now #34435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 3, 2020
Merged

BUG: Fix to GH34422 SeriesGroupBy works only with 'func' now #34435

merged 24 commits into from
Jun 3, 2020

Conversation

gurukiran07
Copy link
Contributor

@gurukiran07 gurukiran07 commented May 28, 2020

This PR tries to fix the bug pointed in #34422 where SeriesGroupBy.agg works with any given column name in NamedAgg.

After discussing with @TomAugspurger and @MarcoGorelli in another issue #34380 regarding this issue in the comments, We came to a solution that

So for SeriesGroupBy.agg, if there are any tuples or NamedAgg present in kwargs then I think we should raise.
Disallowing NamedAgg and tuples with SeriesGroupBy.

Before fix:

s = pd.Series([1,1,2,2,3,3,4,5])
s.groupby(s.values).agg(one = pd.NamedAgg(column='anything',aggfunc='sum'))
  one
1    2
2    4
3    6
4    4
5    5

s.groupby(s.values).agg(one=('something','sum'))
   one
1    2
2    4
3    6
4    4
5    5

After fix:

s = pd.Series([1,1,2,2,3,3,4,5])
s.groupby(s.values).agg(one = pd.NamedAgg(column='anything',aggfunc='sum'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\#gh34422\pandas\pandas\core\groupby\generic.py", line 243, in aggregate
    raise TypeError(tuple_given_message.format(type(kwargs[col]).__name__))
TypeError: 'func' is expected but recieved NamedAgg in **kwargs.

s.groupby(s.values).agg(one=('something','sum'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\#gh34422\pandas\pandas\core\groupby\generic.py", line 243, in aggregate
    raise TypeError(tuple_given_message.format(type(kwargs[col]).__name__))
TypeError: 'func' is expected but recieved tuple in **kwargs.

@pep8speaks
Copy link

pep8speaks commented May 28, 2020

Hello @gurukiran07! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-03 19:37:24 UTC

@MarcoGorelli
Copy link
Member

Thanks for the PR

Can you add a test as well to show how it would close the linked issue?

@gurukiran07
Copy link
Contributor Author

@MarcoGorelli I'm new to open source development. I'm going through writing-tests mentioned in contributing.rst . I'll add tests by tomorrow.

It's currently showing failed in 9 checks and 2 successful checks. How and where can I fix those 9 failing checks?

@TomAugspurger
Copy link
Contributor

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always add tests first

@gurukiran07 gurukiran07 marked this pull request as draft May 29, 2020 04:09
@gurukiran07 gurukiran07 marked this pull request as ready for review May 29, 2020 07:23
@gurukiran07
Copy link
Contributor Author

@jreback and @MarcoGorelli Added tests in test_aggregate.py.

@gurukiran07 gurukiran07 marked this pull request as draft May 29, 2020 07:57
@gurukiran07 gurukiran07 marked this pull request as ready for review May 29, 2020 10:05
@gurukiran07 gurukiran07 requested a review from jreback May 29, 2020 10:06
@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 29, 2020 via email

@gurukiran07
Copy link
Contributor Author

@TomAugspurger Thank you. Now, all checks have been passed. Should have gone through "Coding style" part of contribution guidelines before.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gurukiran07 !

def test_agg_namedtuple(self):
# GH34422
s = pd.Series([1, 1, 2, 2, 3, 3, 4, 5])
msg = "func is"
Copy link
Member

@MarcoGorelli MarcoGorelli May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can msg be more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Can be extend msg = "func is expected but recieved NamedAgg" in this test, while using tuple msg = "func is expected but recieved tuple" and while using list msg = "func is expected but recieved list". Is this fine?

@gurukiran07 gurukiran07 marked this pull request as draft May 29, 2020 13:26
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 1, 2020 via email

@gurukiran07
Copy link
Contributor Author

gurukiran07 commented Jun 1, 2020

@TomAugspurger

No, strings aren't callable. So it'd need to be a callable or str? Perhaps we already have something to check that.

str func is handled using getattr in the below lines may be we can use hasattr(self,col_func)?

@gurukiran07
Copy link
Contributor Author

gurukiran07 commented Jun 1, 2020

@TomAugspurger

if isinstance(func,str):
    return getattr(self,func)(*args,**kwargs)

These lines are hit when str is passed. I guess we can use hasattr or callable.

@gurukiran07 gurukiran07 marked this pull request as draft June 2, 2020 06:24
@gurukiran07 gurukiran07 marked this pull request as ready for review June 2, 2020 08:16
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you add a whatsnew note, groupby section of bug fixes for 1.1. ping on green.

@jreback jreback added this to the 1.1 milestone Jun 2, 2020
@gurukiran07 gurukiran07 requested a review from jreback June 2, 2020 15:56
@gurukiran07
Copy link
Contributor Author

@jreback all 14 checks passed. And updated whatsnew v1.1.0.rst under groupby bug fixes.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, ping on green.

@gurukiran07
Copy link
Contributor Author

@jreback All checks passed. Updated with mentioned changes.

@gurukiran07 gurukiran07 requested a review from jreback June 3, 2020 20:18
@jreback jreback merged commit 73fe6b8 into pandas-dev:master Jun 3, 2020
@jreback
Copy link
Contributor

jreback commented Jun 3, 2020

thanks @gurukiran07 very nice!

@gurukiran07 gurukiran07 deleted the GH34422 branch June 4, 2020 05:22
@gurukiran07
Copy link
Contributor Author

@jreback Thank you. It was a huge learning experience, got to learn lot of new stuff. Hope to contribute more in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SeriesGroupBy works with any column name specified in NamedAgg
5 participants