Skip to content

BUG: groupby.transform(name) validates name is an aggregation #27597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

BUG: groupby.transform(name) validates name is an aggregation #27597

wants to merge 15 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jul 25, 2019

closes #14274
closes #19354
closes #22509

  • tests added / passed
  • whatsnew entry

This is a long-standing bug. Effectively, it makes transform('rank') synonymous with g.rank() and same for other transformations.

We could also resolve this by restricting the g.transform(name) whitelist to only callables and aggs-to-be-broadcasted, telling users to use the named method directly, i.e g.rank() instead. Not a problem keeping this path working though, it just needs to return correct results.

Also, continuing some cleanups after #27467

@WillAyd
Copy link
Member

WillAyd commented Jul 26, 2019

Might have gotten lost in a conversation but should these just raise? I think these muddle the API a bit for transform.

I thought you suggested raising for non-reducing items going through agg seems like that could apply here as well

@ghost
Copy link
Author

ghost commented Jul 26, 2019

I thought you suggested raising for non-reducing items going through agg seems like that could apply here as well

I suggested agg should raise for transformations (will submit it next). That's not the same as transform raising for transformations, and it's main utility is turning aggs into transformations by broadcasting. So not the same case. I thought this route would be less disruptive.

I'd be perfectly ok with restricting g.transform to callables and agg-to-be-broadcasted only, that's reasonable, Just as long as there's agreement on this and it doesn't trigger a tug-of-war that stalls this indefinitely. g.transform('rank') has been returning nonsense for years and I want to finally fix it right now.

@WillAyd
Copy link
Member

WillAyd commented Jul 26, 2019

Right I'd be +1 for that restriction. In my head I would imagine that we could use the agg whitelist to validate what gets sent to both .agg and .transform

@ghost
Copy link
Author

ghost commented Jul 26, 2019

Thank you for reviewing

@ghost
Copy link
Author

ghost commented Jul 26, 2019

I would imagine that we could use the agg whitelist to validate what gets sent to both .agg and
.transform

Did you see the lists added in #27467? I did that so we could do both.

@WillAyd
Copy link
Member

WillAyd commented Jul 26, 2019

Yea that's what I was loosely referring to. Let's see what @jreback thinks

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a non-transformation function (either directly, e.g. rank or an aggregation which broadcasts) should raise.

@pep8speaks
Copy link

pep8speaks commented Jul 26, 2019

Hello @pilkibun! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-26 20:09:14 UTC

@ghost
Copy link
Author

ghost commented Jul 26, 2019

Ok. Passing the name of anything not on the list of reductions now raises an exception. Further, if the method exists on Grouper, the user is also advised to call the method directly.

I also created a section under breaking changes with Previous/New behavior examples.

@ghost ghost changed the title BUG: do not broadcast result of transformations in groupby.transform CLN: groupby.transform(name) validates name is an aggregation Jul 26, 2019
@ghost ghost changed the title CLN: groupby.transform(name) validates name is an aggregation BUG: groupby.transform(name) validates name is an aggregation Jul 26, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge deal, but I think that this could reasonably be called a bug fix. Is there any case where .transform(name) was doing the "expected" thing that we'll now raise on?

@ghost
Copy link
Author

ghost commented Jul 26, 2019

was doing the "expected" thing that we'll now raise on?

all the cum* functions, having been cythonized, took a direct path which was not affected by broadcasting. They are all transformations, not aggs, and will now raise.

@TomAugspurger
Copy link
Contributor

all the cythonized cum* functions took a direct path which was not affected. They are all transformations, not aggs, and will now raise.

Can you give an example?

@ghost
Copy link
Author

ghost commented Jul 26, 2019

of what?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 26, 2019 via email

@ghost
Copy link
Author

ghost commented Jul 26, 2019

all the cum* functions are transformations, not aggs. Those are now meant to be called only directly.

@TomAugspurger
Copy link
Contributor

-1 on outright breaking that then.

And I would need to reread the issue to understand the motivation for changing it.

@ghost
Copy link
Author

ghost commented Jul 26, 2019

oh my god.

#27597 (comment)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for the ffill issue that you are closing


In previous releases, :meth:`DataFrameGroupBy.transform` and
:meth:`SeriesGroupBy.transform` did not validate that the function name
passed was actually the name of an aggregation. As a result, users might get a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the: As a result.... sentence.

rised -> raise


.. code-block:: ipython

In [1]: g.transform('ers >= Decepticons')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass it a name like 'foo'

.. ipython:: python
:okexcept:

g.transform('ers >= Decepticons')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use foo, & make this a code-block (so we don't have the long traceback)

put the 'rank' in its own ipython block; I would also show .rank() or at least indicate that they are now the same.

@@ -241,8 +241,9 @@ class providing the base-class of operations.

Parameters
----------
f : function
Function to apply to each group
func : callable or str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this as f, otherwise this is an api change

@@ -1052,6 +1051,20 @@ def test_transform_agg_by_name(reduction_func, obj):
assert len(set(DataFrame(result).iloc[-3:, -1])) == 1


def test_transform_transformation_by_name(transformation_func):
"""Make sure g.transform('name') raises a helpful error for non-agg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue refences numbers as a comment

@TomAugspurger
Copy link
Contributor

Is #27597 (comment) answering why “cumsum” should raise?

There’s value in accepting non-agg transform names here. We have an open issue about accepting lists of functions in transform. That change would prohibit .transform([“cummin”, “cummax”]). And using .transform to signal to your reader that the operation is indeed a transform is valuable as well.

@ghost
Copy link
Author

ghost commented Jul 29, 2019

Is #27597 (comment) answering why “cumsum” should raise?

Tom, I asked you about this exact issue in #27389 two weeks ago. In response you said you found the bug useful and then then proceeded to ignore my followup. Then you showed up here, after I've written the patch, and the tests, and the docs, skipped over the preceding discussion with your peers, ignored the fact that two of them asked me to make the change you're objecting to, and casually expect me to rewrite this a third time.

I think that's awful behavior, and it's very effectively undermining my desire to help improve pandas.

If this is what I have to put up with in order to volunteer my time to fix serious, long-standing bugs, it's a bad deal and my answer is no.

@ghost ghost closed this Jul 29, 2019
@ghost ghost deleted the groupby_transform_cleanups2 branch July 29, 2019 18:12
@TomAugspurger
Copy link
Contributor

In response you said you found the bug useful

What comment is that? In #27389 (comment) I said that I find "broadcasting a transformation result useful in some other cases?". I'm not sure anymore what that's referring to since the post may have been edited.

and then then proceeded to ignore my followup.

Don't attribute to malice what can be attributed to busyness.

Then you showed up here, after I've written the patch, and the tests, and the docs, skipped over the preceding discussion with your peers, ignored the fact that two of them asked me to make the change you're objecting to, and casually expect me to rewrite this a third time.

If I find a change that I think is detrimental to pandas then yes, I'm going to speak up about it. I do apologize if that means lost effort, but I care more about the project as a whole.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants