-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
feature request: Support for 'named' lambda functions in DataFrame.agg([]) #10100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
At this moment, you can also just define a function, so its name will be used:
As this is quite easy as well, I don't know we should add that extra complexity |
I agree, that's how I solve it now, but this adds a function to the global namespace, which is not always desired. If the computation that you require is a big one, than sure, that should be a proper function. For small inline calculations like the one I used as an example, a lambda would be much nicer. Also if you need to pass additional arguments to your function, a lambda is just a much more elegant way of coding it that with a new global function. In the pandas docs, one of the examples on how to use 'agg' actually uses lambda's this way. grouped.agg({'C' : np.sum,
'D' : lambda x: np.std(x, ddof=1)}) I don't think that the feature actually "adds complexity". It gives you an additional, elegant way of doing something without even changing the existing API functionality. |
see #8593 This API needs unification; if you want to spec out |
Yet another argument for implementing it: the current behavior when you enter a dict object seems to be a bit arbitrary. Something would need to be done about that anyway... |
@jreback What exactly is your idea for I'm not opposed to improvements in this area, but I agree with @jorisvandenbossche that the value is questionable in this particular case. |
What is |
@jkokorian an idea of @jreback, but which is not implemented yet, see his comment here: #8593 |
The introduction of named aggregation in 0.25.0 seems to solve this issue. |
Thanks - agreed @victorlin. Also, one can now use kwargs with
produces
|
I often have the situation where I would like to apply multiple aggregation functions to all the columns of a grouped dataframe, like:
That works well, but sometimes (all the time, actually) I would also like to be able to use lambda functions this way, like:
This works fine, but the resulting column name will now be 'lambda', which is ugly. This can be resolved by using the much more verbose syntax where you specify a dictionary for every column separately, but I would propose to allow the following syntax:
The dictionary key should then be used as the resulting column name.
Interestingly, using this syntax in the version 0.16 does not produce an error, but produces a column named 'Nan', that is filled with tupple values: ('n','o','r','m','a','l','i','z','e','d','_','m','e','a','n'), which I don't think is of use to anyone:)
The text was updated successfully, but these errors were encountered: