Skip to content

Inconsistent behavior between df.sum() and groupby(col).agg('sum') on lists #29033

Closed
@gqfiddler

Description

@gqfiddler
df = pd.DataFrame({
    'id':[1,2,2],
    'cost':[5,5,5],
    'letters':[['a','b'],['a','b'],['a','b']]
})
print(df.sum()) # joins lists in 'letters' column
print(df.groupby('id').agg('sum')) # drops 'letters' column from results
print(df.groupby('id').agg(pd.Series.sum)) # successfully joins lists in 'letters' column

Problem description

Like the + operator in python, .sum() in pandas is overloaded to perform list joins as well as numerical addition. However, 'sum' inside of the 'agg' method does not do this. Instead, it treats lists as un-addable objects and drops them from the dataset.

For both convenience and consistency, df.join('col').agg('sum') should exhibit the same behavior on lists as df.sum() and df.col.sum(). This would be as easy as calling the existing pd.Series.sum() function given a 'sum' input from the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyNested DataData where the values are collections (lists, sets, dicts, objects, etc.).Nuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.applyReduction Operationssum, mean, min, max, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions