Closed
Description
df = pd.DataFrame({
'id':[1,2,2],
'cost':[5,5,5],
'letters':[['a','b'],['a','b'],['a','b']]
})
print(df.sum()) # joins lists in 'letters' column
print(df.groupby('id').agg('sum')) # drops 'letters' column from results
print(df.groupby('id').agg(pd.Series.sum)) # successfully joins lists in 'letters' column
Problem description
Like the + operator in python, .sum() in pandas is overloaded to perform list joins as well as numerical addition. However, 'sum' inside of the 'agg' method does not do this. Instead, it treats lists as un-addable objects and drops them from the dataset.
For both convenience and consistency, df.join('col').agg('sum') should exhibit the same behavior on lists as df.sum() and df.col.sum(). This would be as easy as calling the existing pd.Series.sum() function given a 'sum' input from the user.