-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add Support for GroupBy Numeric Operations #20060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would be surprising not to also support the other order, e.g., |
This already is the idiomatic way, quite explict.
|
Fair enough on In [14]: df - df.groupby('key').transform('mean')
Out[14]:
key val
0 NaN -0.5
1 NaN 0.5
2 NaN -0.5
3 NaN 0.5 So I guess the question becomes do we think abstracting all of this through support for GroupBy operations is worth it or would we rather live with some slight inconsistencies in how to calculate across the various objects |
Another use case is when the normalization values are pre-computed, perhaps from another dataset, e.g., you only have access to |
@WillAyd - Just taking a look through old issues to see if we have any that can be closed. Have your thoughts here evolved at all? For the DataFrame case you can do:
Though I'm not sure that would be the indented result; should key be left untouched rather than normalized? I imagine that would be the result of |
xref some of the conversation in #20024 right now the following is possible
But trying to do something similar with grouped data does not work:
I am proposing that we update the
GroupBy
class to allow numerical operations with the result of aggregations or transformations against that object. Note that this is possible today through a much more verbose and hackish:The
Series
/DataFrame
operations are all added viaadd_special_arithmetic_methods
with their implementations being defined inops.py
. We could leverage a similar mechanism forGroupBy
Why is this worth doing?
Series
,DataFrame
andGroupBy
objectsmad
(see Cythonized GroupBy mad #20024)Why may it not be worth doing?
GroupBy
class that is already in need of refactorConsideration Points
With this proposal, the left operand would always be a
GroupBy
object and the right operand would always be a the result of a function application against that sameGroupBy
. The result of the operation should be aSeries
orDataFrame
like-indexed to the original object.That said, the following operations would in theory be identical:
I'm not sure if we care to differentiate between these and force users into choosing one or the other.
Thoughts?
The text was updated successfully, but these errors were encountered: