-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
fixed transform docs #43058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed transform docs #43058
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -318,14 +318,13 @@ class providing the base-class of operations. | |
""" | ||
|
||
_transform_template = """ | ||
Call function producing a like-indexed %(klass)s on each group and | ||
return a %(klass)s having the same indexes as the original object | ||
filled with the transformed values | ||
Apply function ``func`` column-by-column to the GroupBy object and return a %(klass)s | ||
with the same number of indices as the group. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The part of the phrase is referring to the entire return of the transform method. Thus, saying "as the group" here doesn't make sense - there can be multiple groups, so there is no "the group". Also, @jreback was commenting that the return must have the same index (not just the same number of elements!) as the input. |
||
|
||
Parameters | ||
---------- | ||
f : function | ||
Function to apply to each group. | ||
func : function | ||
Function to apply to each column within each group. | ||
|
||
Can also accept a Numba JIT function with | ||
``engine='numba'`` specified. | ||
|
@@ -375,16 +374,16 @@ class providing the base-class of operations. | |
Each group is endowed the attribute 'name' in case you need to know | ||
which group you are working on. | ||
|
||
The current implementation imposes three requirements on f: | ||
The current implementation imposes three requirements on func: | ||
|
||
* f must return a value that either has the same shape as the input | ||
* func must return a value that either has the same shape as the input | ||
subframe or can be broadcast to the shape of the input subframe. | ||
For example, if `f` returns a scalar it will be broadcast to have the | ||
For example, if `func` returns a scalar it will be broadcast to have the | ||
same shape as the input subframe. | ||
* if this is a DataFrame, f must support application column-by-column | ||
in the subframe. If f also supports application to the entire subframe, | ||
* func must support application column-by-column | ||
in the subframe. If func also supports application to the entire subframe, | ||
then a fast path is used starting from the second chunk. | ||
* f must not mutate groups. Mutation is not supported and may | ||
* func must not mutate groups. Mutation is not supported and may | ||
produce unexpected results. See :ref:`gotchas.udf-mutation` for more details. | ||
|
||
When using ``engine='numba'``, there will be no "fall back" behavior internally. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sometimes true, but not always. For example:
gives (after clipping of the first group which is used to determine slowpath vs fastpath)
In this case, the transform is evaluated on the entire group, not column-by-column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation here would be to combine the previous version (calling on the group) along with what you have (column-by-column). If calling on the first group is successful, then transform will operate group-by-group. Otherwise, it falls back to column-by-column (within each group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! However, I think combining the grouping + columns is a bit confusing. How would a user differentiate between using the .apply() vs the .transform() function then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willie3838 - Agreed this is a part of the API that needs cleaning up, but I'd recommend leaving that aspect out of scope for this PR and documenting the behavior as it currently exists. If you agree with my assessment that the current documentation in this PR is not correct, then it needs to be fixed. On the other hand, if you think my assessment is not right, then let me know how!