You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently pandas implements various groupby ops via _make_wrapper. There are "allow lists" in pandas.core.groupby.base for both SeriesGroupBy and DataFrameGroupBy, and the class decorator pandas.core.groupby.generic.pin_allowlisted_properties adds these as properties to the SeriesGroupBy and DataFrameGroupBy classes.
Instead, should we convert _make_wrapper into a normal method (e.g. _op_via_apply) and add methods for each op in the allow lists. An example of such an added method would be:
The advantage of the current approach is less boilerplate code (the method definitions) and a consistent API between e.g. DataFrame and DataFrameGroupBy ops. But the disadvantage is that the docs come out as properties (e.g. skew) and anyone using Python's dynamic abilities gets incorrect results:
I also find the current implementation harder to understand / debug, but that is very much an opinion.
Finally, tests can be added to ensure the consistency of arguments between e.g. DataFrame.skew and DataFrameGroupBy.skew using Python's inspect module.
Generally I'm +1 toward moving away from dynamic method generation in favor of more-boilerplate, explicit function signatures. Agreed with your assessment of current, dynamic method generation being harder to understand / debug.
For reference, the window methods pattern is generally:
Currently pandas implements various groupby ops via
_make_wrapper
. There are "allow lists" inpandas.core.groupby.base
for both SeriesGroupBy and DataFrameGroupBy, and the class decoratorpandas.core.groupby.generic.pin_allowlisted_properties
adds these as properties to the SeriesGroupBy and DataFrameGroupBy classes.Instead, should we convert
_make_wrapper
into a normal method (e.g._op_via_apply
) and add methods for each op in the allow lists. An example of such an added method would be:The advantage of the current approach is less boilerplate code (the method definitions) and a consistent API between e.g. DataFrame and DataFrameGroupBy ops. But the disadvantage is that the docs come out as properties (e.g. skew) and anyone using Python's dynamic abilities gets incorrect results:
I also find the current implementation harder to understand / debug, but that is very much an opinion.
Finally, tests can be added to ensure the consistency of arguments between e.g.
DataFrame.skew
andDataFrameGroupBy.skew
using Python'sinspect
module.cc @jreback @jbrockmendel @mroeschke for any thoughts.
The text was updated successfully, but these errors were encountered: