Skip to content

CLN: Refactor groupby's _make_wrapper #48028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Aug 10, 2022 · 2 comments · Fixed by #48400
Closed

CLN: Refactor groupby's _make_wrapper #48028

rhshadrach opened this issue Aug 10, 2022 · 2 comments · Fixed by #48400
Assignees
Milestone

Comments

@rhshadrach
Copy link
Member

Currently pandas implements various groupby ops via _make_wrapper. There are "allow lists" in pandas.core.groupby.base for both SeriesGroupBy and DataFrameGroupBy, and the class decorator pandas.core.groupby.generic.pin_allowlisted_properties adds these as properties to the SeriesGroupBy and DataFrameGroupBy classes.

Instead, should we convert _make_wrapper into a normal method (e.g. _op_via_apply) and add methods for each op in the allow lists. An example of such an added method would be:

    @doc(DataFrame.skew.__doc__)
    def skew(
        self,
        axis=lib.no_default,
        skipna=True,
        level=None,
        numeric_only=None,
        **kwargs
    ):
        result = self._op_via_apply(
            'skew',
            axis=axis,
            skipna=skipna,
            level=level,
            numeric_only=numeric_only,
            **kwargs
        )
        return result

The advantage of the current approach is less boilerplate code (the method definitions) and a consistent API between e.g. DataFrame and DataFrameGroupBy ops. But the disadvantage is that the docs come out as properties (e.g. skew) and anyone using Python's dynamic abilities gets incorrect results:

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a')

print(type(DataFrameGroupBy.sum))
print(type(DataFrameGroupBy.skew))
print(type(gb.sum))
print(type(gb.skew))
print(inspect.signature(df.groupby('a').sum))
print(inspect.signature(df.groupby('a').skew))

# <class 'function'>
# <class 'property'>
# <class 'method'>
# <class 'function'>
# (numeric_only: 'bool | lib.NoDefault' = <no_default>, min_count: 'int' = 0, engine: 'str | None' = None, engine_kwargs: 'dict[str, bool] | None' = None)
# (*args, **kwargs)

I also find the current implementation harder to understand / debug, but that is very much an opinion.

Finally, tests can be added to ensure the consistency of arguments between e.g. DataFrame.skew and DataFrameGroupBy.skew using Python's inspect module.

cc @jreback @jbrockmendel @mroeschke for any thoughts.

@rhshadrach rhshadrach added Groupby Clean Needs Discussion Requires discussion from core team before further action labels Aug 10, 2022
@jbrockmendel
Copy link
Member

no strong opinion; we've been moving towards more-boilerplate for typing purposes anyway, so that reduction isn't as compelling as it once was.

IIRC these were once defined with exec, so it could be worse!

@mroeschke
Copy link
Member

Generally I'm +1 toward moving away from dynamic method generation in favor of more-boilerplate, explicit function signatures. Agreed with your assessment of current, dynamic method generation being harder to understand / debug.

For reference, the window methods pattern is generally:

@doc(
    _shared_docs[...],
    section="...",
    ...
)
def mean(...):
    return self._apply(
        cython_or_numba_func,
        **kwargs
    )    

@rhshadrach rhshadrach self-assigned this Aug 26, 2022
@rhshadrach rhshadrach removed the Needs Discussion Requires discussion from core team before further action label Sep 5, 2022
@rhshadrach rhshadrach added this to the 1.6 milestone Sep 12, 2022
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants