Skip to content

DOC: DataFrameGroupBy.transform #42907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
panda-byte opened this issue Aug 5, 2021 · 6 comments
Open

DOC: DataFrameGroupBy.transform #42907

panda-byte opened this issue Aug 5, 2021 · 6 comments
Labels
Apply Apply, Aggregate, Transform, Map Docs Groupby

Comments

@panda-byte
Copy link

panda-byte commented Aug 5, 2021

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html

Documentation problem

In my eyes, the documentation makes it very unclear how func (which is also mislabeled f in the parameter descriptions) is actually applied to each group.

General description:

Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values

Description of parameter func:

Function to apply to each group. [...]

I'm not sure if I'm just confused by this or if the documentation is actually misleading, because for me this is implying that transform takes a function which is called for each group and thus accepts a DataFrame as an argument.
This however is not the case, as transform actually applies func to each column within each group.
Only this phrase in the 'Notes' section (and the examples) somewhat hint at this functionality, in my opinion:

if this is a DataFrame, f must support application column-by-column in the subframe

(which I find kind of confusing as well, to be honest, as I expected that the "this" in this case would always be a DataFrame, because it is a method of 'groupby. DataFrame GroupBy')

On the other hand, the shorthand explanation in the 'See also' about transform section on other pages like GroupBy.apply is much more concise in my opinion:

transform: Apply function column-by-column to the GroupBy object.

Which to me, as an unexperienced pandas user, makes it crystal-clear what the function is supposed to do, as opposed to "Call function producing a like-indexed DataFrame on each group", which is rather ominous to me. It took me a while to figure out that GroupBy.apply is what I actually needed.

Suggested fix for documentation

Change the general description and func parameter description to include something along the lines of "Apply function column-by-column to the GroupBy object", which is already used as a short description as mentioned before, and fix the mislabeled parameter. And maybe the "if this is a DataFrame" phrase in the notes should be changed as well.

@panda-byte panda-byte added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2021
@rhshadrach rhshadrach added the Apply Apply, Aggregate, Transform, Map label Aug 7, 2021
@willie3838
Copy link
Contributor

take

@willie3838
Copy link
Contributor

Sorry I'm new to this, but where can I edit the docs within the repository?

@willie3838
Copy link
Contributor

Nevermind figured it out :)

This was referenced Aug 15, 2021
@jreback jreback added this to the 1.4 milestone Aug 17, 2021
@mroeschke mroeschke added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
@simonjayhawkins
Copy link
Member

removing milestone

@simonjayhawkins simonjayhawkins removed this from the 1.4 milestone Jan 20, 2022
@jorisvandenbossche
Copy link
Member

For future reference, see also the comments on one of the PRs that tried to address this: #43058 (comment). In some case the transform function is actually applied on the group DataFrame (and not column-by-column). But it is certainly a bit confusing when this exactly happens and when not.

@rhshadrach
Copy link
Member

rhshadrach commented Apr 2, 2023

I think we shouldn't infer here. A few options:

  • Add an argument to DataFrameGroupBy.transform (e.g. by="(series|frame)" or by_column=(True|False))
  • Only support fast path (operating on the entire group)
  • Split transform into two methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Docs Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants