Skip to content

feature request: Support for 'named' lambda functions in DataFrame.agg([]) #10100

Closed
@jkokorian

Description

@jkokorian

I often have the situation where I would like to apply multiple aggregation functions to all the columns of a grouped dataframe, like:

grouped = df.groupby('somekey')
dfAggregated = grouped.agg([np.mean, np.std])

That works well, but sometimes (all the time, actually) I would also like to be able to use lambda functions this way, like:

grouped = df.groupby('somekey')
dfAggregated = grouped.agg([np.mean, np.std, lambda v: v.mean()/v.max()])

This works fine, but the resulting column name will now be 'lambda', which is ugly. This can be resolved by using the much more verbose syntax where you specify a dictionary for every column separately, but I would propose to allow the following syntax:

grouped = df.groupby('somekey')
dfAggregated = grouped.agg([np.mean,np.std,{'normalized_mean': lambda v: v.mean()/v.max()}])

The dictionary key should then be used as the resulting column name.

Interestingly, using this syntax in the version 0.16 does not produce an error, but produces a column named 'Nan', that is filled with tupple values: ('n','o','r','m','a','l','i','z','e','d','_','m','e','a','n'), which I don't think is of use to anyone:)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions