Skip to content

fixed transform docs #43058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 10 additions & 11 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,14 +318,13 @@ class providing the base-class of operations.
"""

_transform_template = """
Call function producing a like-indexed %(klass)s on each group and
return a %(klass)s having the same indexes as the original object
filled with the transformed values
Apply function ``func`` column-by-column to the GroupBy object and return a %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sometimes true, but not always. For example:

def foo(x):
    print(x)
    return x

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3], 'c': [3, 4]})
df.groupby('a').transform(foo)

gives (after clipping of the first group which is used to determine slowpath vs fastpath)

   b  c
0  2  3
   b  c
1  3  4

In this case, the transform is evaluated on the entire group, not column-by-column.

Copy link
Member

@rhshadrach rhshadrach Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation here would be to combine the previous version (calling on the group) along with what you have (column-by-column). If calling on the first group is successful, then transform will operate group-by-group. Otherwise, it falls back to column-by-column (within each group).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input! However, I think combining the grouping + columns is a bit confusing. How would a user differentiate between using the .apply() vs the .transform() function then?

Copy link
Member

@rhshadrach rhshadrach Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willie3838 - Agreed this is a part of the API that needs cleaning up, but I'd recommend leaving that aspect out of scope for this PR and documenting the behavior as it currently exists. If you agree with my assessment that the current documentation in this PR is not correct, then it needs to be fixed. On the other hand, if you think my assessment is not right, then let me know how!

with the same number of indices as the group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part of the phrase is referring to the entire return of the transform method. Thus, saying "as the group" here doesn't make sense - there can be multiple groups, so there is no "the group". Also, @jreback was commenting that the return must have the same index (not just the same number of elements!) as the input.


Parameters
----------
f : function
Function to apply to each group.
func : function
Function to apply to each column within each group.

Can also accept a Numba JIT function with
``engine='numba'`` specified.
Expand Down Expand Up @@ -375,16 +374,16 @@ class providing the base-class of operations.
Each group is endowed the attribute 'name' in case you need to know
which group you are working on.

The current implementation imposes three requirements on f:
The current implementation imposes three requirements on func:

* f must return a value that either has the same shape as the input
* func must return a value that either has the same shape as the input
subframe or can be broadcast to the shape of the input subframe.
For example, if `f` returns a scalar it will be broadcast to have the
For example, if `func` returns a scalar it will be broadcast to have the
same shape as the input subframe.
* if this is a DataFrame, f must support application column-by-column
in the subframe. If f also supports application to the entire subframe,
* func must support application column-by-column
in the subframe. If func also supports application to the entire subframe,
then a fast path is used starting from the second chunk.
* f must not mutate groups. Mutation is not supported and may
* func must not mutate groups. Mutation is not supported and may
produce unexpected results. See :ref:`gotchas.udf-mutation` for more details.

When using ``engine='numba'``, there will be no "fall back" behavior internally.
Expand Down