-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby.transform(name) validates name is an aggregation #27597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Might have gotten lost in a conversation but should these just raise? I think these muddle the API a bit for transform. I thought you suggested raising for non-reducing items going through agg seems like that could apply here as well |
I suggested agg should raise for transformations (will submit it next). That's not the same as transform raising for transformations, and it's main utility is turning aggs into transformations by broadcasting. So not the same case. I thought this route would be less disruptive. I'd be perfectly ok with restricting |
Right I'd be +1 for that restriction. In my head I would imagine that we could use the agg whitelist to validate what gets sent to both |
Thank you for reviewing |
Did you see the lists added in #27467? I did that so we could do both. |
Yea that's what I was loosely referring to. Let's see what @jreback thinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a non-transformation function (either directly, e.g. rank or an aggregation which broadcasts) should raise.
Ok. Passing the name of anything not on the list of reductions now raises an exception. Further, if the method exists on Grouper, the user is also advised to call the method directly. I also created a section under breaking changes with Previous/New behavior examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a huge deal, but I think that this could reasonably be called a bug fix. Is there any case where .transform(name)
was doing the "expected" thing that we'll now raise on?
all the |
Can you give an example? |
of what? |
Does transform(‘cumsum’) raise?
If so, why? That seems OK to me.
… On Jul 26, 2019, at 15:40, pilkibun ***@***.***> wrote:
of what?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
all the cum* functions are transformations, not aggs. Those are now meant to be called only directly. |
-1 on outright breaking that then. And I would need to reread the issue to understand the motivation for changing it. |
oh my god. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add tests for the ffill issue that you are closing
|
||
In previous releases, :meth:`DataFrameGroupBy.transform` and | ||
:meth:`SeriesGroupBy.transform` did not validate that the function name | ||
passed was actually the name of an aggregation. As a result, users might get a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the: As a result.... sentence.
rised -> raise
|
||
.. code-block:: ipython | ||
|
||
In [1]: g.transform('ers >= Decepticons') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pass it a name like 'foo'
.. ipython:: python | ||
:okexcept: | ||
|
||
g.transform('ers >= Decepticons') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use foo, & make this a code-block (so we don't have the long traceback)
put the 'rank' in its own ipython block; I would also show .rank() or at least indicate that they are now the same.
@@ -241,8 +241,9 @@ class providing the base-class of operations. | |||
|
|||
Parameters | |||
---------- | |||
f : function | |||
Function to apply to each group | |||
func : callable or str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave this as f, otherwise this is an api change
@@ -1052,6 +1051,20 @@ def test_transform_agg_by_name(reduction_func, obj): | |||
assert len(set(DataFrame(result).iloc[-3:, -1])) == 1 | |||
|
|||
|
|||
def test_transform_transformation_by_name(transformation_func): | |||
"""Make sure g.transform('name') raises a helpful error for non-agg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue refences numbers as a comment
Is #27597 (comment) answering why “cumsum” should raise? There’s value in accepting non-agg transform names here. We have an open issue about accepting lists of functions in transform. That change would prohibit .transform([“cummin”, “cummax”]). And using .transform to signal to your reader that the operation is indeed a transform is valuable as well. |
Tom, I asked you about this exact issue in #27389 two weeks ago. In response you said you found the bug useful and then then proceeded to ignore my followup. Then you showed up here, after I've written the patch, and the tests, and the docs, skipped over the preceding discussion with your peers, ignored the fact that two of them asked me to make the change you're objecting to, and casually expect me to rewrite this a third time. I think that's awful behavior, and it's very effectively undermining my desire to help improve pandas. If this is what I have to put up with in order to volunteer my time to fix serious, long-standing bugs, it's a bad deal and my answer is no. |
What comment is that? In #27389 (comment) I said that I find "broadcasting a transformation result useful in some other cases?". I'm not sure anymore what that's referring to since the post may have been edited.
Don't attribute to malice what can be attributed to busyness.
If I find a change that I think is detrimental to pandas then yes, I'm going to speak up about it. I do apologize if that means lost effort, but I care more about the project as a whole. |
closes #14274
closes #19354
closes #22509
This is a long-standing bug. Effectively, it makes
transform('rank')
synonymous withg.rank()
and same for other transformations.We could also resolve this by restricting the
g.transform(name)
whitelist to only callables and aggs-to-be-broadcasted, telling users to use the named method directly, i.eg.rank()
instead. Not a problem keeping this path working though, it just needs to return correct results.Also, continuing some cleanups after #27467