-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
fixed transform docs #43058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed transform docs #43058
Conversation
willie3838
commented
Aug 15, 2021
- closes DOC: DataFrameGroupBy.transform #42907
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Hello @willie3838! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-08-17 16:34:43 UTC |
pandas/core/groupby/groupby.py
Outdated
return a %(klass)s having the same indexes as the original object | ||
filled with the transformed values | ||
Apply function ``func`` column-by-column to the GroupBy object and return a %(klass)s | ||
with the same length as the group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you say index (can say length too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Changes to the argument name (f -> func) is great - but the summary is not quite accurate. Some thoughts below.
Call function producing a like-indexed %(klass)s on each group and | ||
return a %(klass)s having the same indexes as the original object | ||
filled with the transformed values | ||
Apply function ``func`` column-by-column to the GroupBy object and return a %(klass)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sometimes true, but not always. For example:
def foo(x):
print(x)
return x
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3], 'c': [3, 4]})
df.groupby('a').transform(foo)
gives (after clipping of the first group which is used to determine slowpath vs fastpath)
b c
0 2 3
b c
1 3 4
In this case, the transform is evaluated on the entire group, not column-by-column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation here would be to combine the previous version (calling on the group) along with what you have (column-by-column). If calling on the first group is successful, then transform will operate group-by-group. Otherwise, it falls back to column-by-column (within each group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! However, I think combining the grouping + columns is a bit confusing. How would a user differentiate between using the .apply() vs the .transform() function then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willie3838 - Agreed this is a part of the API that needs cleaning up, but I'd recommend leaving that aspect out of scope for this PR and documenting the behavior as it currently exists. If you agree with my assessment that the current documentation in this PR is not correct, then it needs to be fixed. On the other hand, if you think my assessment is not right, then let me know how!
return a %(klass)s having the same indexes as the original object | ||
filled with the transformed values | ||
Apply function ``func`` column-by-column to the GroupBy object and return a %(klass)s | ||
with the same number of indices as the group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part of the phrase is referring to the entire return of the transform method. Thus, saying "as the group" here doesn't make sense - there can be multiple groups, so there is no "the group". Also, @jreback was commenting that the return must have the same index (not just the same number of elements!) as the input.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Thanks for the PR, but appears that it has gone stale. Closing for now, but if interested in continuing let us know and we can reopen. |