-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: DataFrameGroupBy.transform #42907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
Sorry I'm new to this, but where can I edit the docs within the repository? |
Nevermind figured it out :) |
removing milestone |
For future reference, see also the comments on one of the PRs that tried to address this: #43058 (comment). In some case the transform function is actually applied on the group DataFrame (and not column-by-column). But it is certainly a bit confusing when this exactly happens and when not. |
I think we shouldn't infer here. A few options:
|
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html
Documentation problem
In my eyes, the documentation makes it very unclear how
func
(which is also mislabeledf
in the parameter descriptions) is actually applied to each group.General description:
Description of parameter
func
:I'm not sure if I'm just confused by this or if the documentation is actually misleading, because for me this is implying that
transform
takes a function which is called for each group and thus accepts aDataFrame
as an argument.This however is not the case, as
transform
actually appliesfunc
to each column within each group.Only this phrase in the 'Notes' section (and the examples) somewhat hint at this functionality, in my opinion:
(which I find kind of confusing as well, to be honest, as I expected that the "this" in this case would always be a
DataFrame
, because it is a method of 'groupby. DataFrame GroupBy')On the other hand, the shorthand explanation in the 'See also' about
transform
section on other pages like GroupBy.apply is much more concise in my opinion:Which to me, as an unexperienced pandas user, makes it crystal-clear what the function is supposed to do, as opposed to "Call function producing a like-indexed DataFrame on each group", which is rather ominous to me. It took me a while to figure out that GroupBy.apply is what I actually needed.
Suggested fix for documentation
Change the general description and
func
parameter description to include something along the lines of "Apply function column-by-column to the GroupBy object", which is already used as a short description as mentioned before, and fix the mislabeled parameter. And maybe the "if this is a DataFrame" phrase in the notes should be changed as well.The text was updated successfully, but these errors were encountered: