Skip to content

BUG: groupby.agg with more than one lambda is not allowed? #7186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
acorbe opened this issue May 20, 2014 · 11 comments
Closed

BUG: groupby.agg with more than one lambda is not allowed? #7186

acorbe opened this issue May 20, 2014 · 11 comments
Labels

Comments

@acorbe
Copy link
Contributor

acorbe commented May 20, 2014

consider the following example

 a = rand(100)
 b = np.floor(rand(100)*100)

 df = pd.DataFrame({'a' : a , 'b' : b})

 grp = df.groupby(df.b)    

I have grouped the values in a by b.

Now, if I want to plot the trend over the groups with mean and std I can do

grp.a.agg([np.mean, lambda x : np.mean(x) + np.std(x) , lambda x :  np.mean(x) - np.std(x) ]).plot()

which gives me

SpecificationError: Function names must be unique, found multiple named <lambda>

while

  grp.a.agg([np.mean, lambda x : np.mean(x) + np.std(x) ]).plot()

which has just one lambda works ok.

Is this a bug?

In order to make the thing work I had to define real functions (i.e. in terms of def), to be put in agg.

@jreback
Copy link
Contributor

jreback commented May 20, 2014

You can specify a dictionary; this requires named columns. I suppose it could work, not 100% sure why it was done this way (it needs unique functions as the results are returned as a dictionary; they could in theory be returned as a list I think that could simply create columns).

In [27]: grp.a.agg({'one' : np.mean, 'two' : lambda x : np.mean(x) + np.std(x) , 'three' : lambda x :  np.mean(x) - np.std(x) })
Out[27]: 
         three       two       one
b                                 
-253  0.156897  0.156897  0.156897
-216  0.452120  0.452120  0.452120
-191  0.893074  0.893074  0.893074
-178  1.170801  1.170801  1.170801
-177 -1.324476 -1.324476 -1.324476
-162  0.835708  1.241353  1.038531
-156 -1.220583 -1.220583 -1.220583
-147 -2.301474 -2.301474 -2.301474
-136 -1.125749 -1.125749 -1.125749
-133 -0.398064 -0.398064 -0.398064
-132  0.011879  0.011879  0.011879
-129 -0.257017 -0.257017 -0.257017
-114  0.795851  0.795851  0.795851
-113 -1.697932 -1.697932 -1.697932
-111 -0.309536 -0.309536 -0.309536
-110 -0.031828 -0.031828 -0.031828
-94  -0.391354 -0.391354 -0.391354
-87  -0.010518  0.551286  0.270384
-85  -0.711772 -0.711772 -0.711772
-77  -0.147718 -0.106666 -0.127192
-73  -0.796055  0.985810  0.094878
-68  -0.249214 -0.249214 -0.249214
-65   0.897349  0.897349  0.897349
-64  -0.151405 -0.014542 -0.082973
-60  -0.305136 -0.305136 -0.305136
-52   0.084092  0.084092  0.084092
-51  -0.821255 -0.619251 -0.720253
-48  -0.542030  1.237966  0.347968
-44   0.822566  0.822566  0.822566
-43   0.165354  0.165354  0.165354
-38   1.052166  1.052166  1.052166
-33   0.649841  0.649841  0.649841
-32  -0.020592 -0.020592 -0.020592
-31  -1.340543  0.886358 -0.227093
-30   0.278267  0.278267  0.278267
-15   0.220145  0.220145  0.220145
-12  -0.247523 -0.247523 -0.247523
-9   -1.017454 -1.017454 -1.017454
-5    2.230568  2.230568  2.230568
-3   -1.258155 -1.258155 -1.258155
 1   -0.310485 -0.310485 -0.310485
 2   -0.265832 -0.265832 -0.265832
 3   -0.008983 -0.008983 -0.008983
 5   -0.320702 -0.320702 -0.320702
 13  -0.634021 -0.634021 -0.634021
 14   0.588749  0.588749  0.588749
 16  -0.843814 -0.843814 -0.843814
 18  -0.534178 -0.534178 -0.534178
 19  -0.246229 -0.246229 -0.246229
 20  -0.095204 -0.095204 -0.095204
 21  -1.586995  0.941961 -0.322517
 27  -0.054841 -0.054841 -0.054841
 38   0.108338  0.108338  0.108338
 39  -0.924176 -0.924176 -0.924176
 57  -0.562416 -0.144378 -0.353397
 60   1.074620  1.074620  1.074620
 64  -1.302721  0.358431 -0.472145
 71   0.033022  0.033022  0.033022
 75   1.088710  1.088710  1.088710
 78  -0.300983 -0.300983 -0.300983
           ...       ...       ...

@acorbe
Copy link
Contributor Author

acorbe commented May 20, 2014

@jreback

Thanks!

@jreback
Copy link
Contributor

jreback commented May 20, 2014

going to close this; if you fee that this really should be implemented, pls reopen (and if you can submit a PR!)

@BenDundee
Copy link

The proposed workaround throws a FutureWarning in the current version of pandas. Should this bug be reopened?

@jorisvandenbossche
Copy link
Member

That's indeed an unfortunate side effect of the deprecation.
I think the easiest solution is to use actual named functions instead of lambda's:

In [79]: def mean_plus_std(x): return np.mean(x) + np.std(x)

In [80]: def mean_minus_std(x): return np.mean(x) - np.std(x)

In [81]: grp.a.agg([np.mean, mean_plus_std, mean_minus_std])
Out[81]: 
          mean  mean_plus_std  mean_minus_std
b                                            
0.0   0.468446       0.696463        0.240430
2.0   0.032308       0.032308        0.032308
3.0   0.704209       0.874344        0.534075
...

Something else we have been discussing is to allow kwargs to be different functions, something like:

grp.a.agg(one=np.mean, two=lambda x : np.mean(x) + np.std(x) , three=lambda x :  np.mean(x) - np.std(x) ])

but this has not been implemented (and has some additional difficulties, as how to deal with kwargs that could be passed to the function)

@thebeancounter
Copy link

I found a workaround.
def p(x):
    return (1,2)
#will return two values in one function

df.groupby(col).apply(lambda x:p(x))
#will convert the new column into two columns of different values
df[[newCol1,newCol2]] = df[df.columns.values[-1]].apply(pd.Series)

@neilaronson
Copy link

This has caused me huge frustration and I believe this should be updated to allow passing the same function and then providing the desired name of the output column. I'm working with a custom aggregation function that takes an additional argument by using functool's partial or simply using multiple lambda functions. I was hoping to avoid 6 separate named functions, but with the current method I have to do that, even though each function is only slightly different than the other. The "workarounds" here don't save any time compared to just having separately defined functions that are all very similar.

@neilaronson
Copy link

neilaronson commented Mar 7, 2018

I have found a more satisfactory workaround, specifically for the case where you want to apply multiple similar functions to the same column. You can create a function factory like so:

def ip_is(ip):
    def ipf(x):
        return (x==ip).mean()
    ipf.__name__ = 'ipf {}'.format(str(ip))
    return ipf 

ip_by_day = dfp.groupby('day').agg({'ip': [ip_is('123'), ip_is('456), ip_is('789')]})

Here I'm checking how many records per day have a certain IP. Basically you can alter the name of the function returned manually and avoid the Specification Error.

@chandanshikhar1
Copy link

chandanshikhar1 commented Mar 27, 2018

I am doing something like this and I run into similar error

fs = [lambda x: np.percentile(x, p) for p in ptiles] + [np.sum] off_smry = gb_off['delivery_time'].agg(fs)

Here is the error I get
SpecificationError: Function names must be unique, found multiple named <lambda>

I think it should be allowed to do something like this. In practical scenarios people could be generating multiple lambda functions to apply.

@AndreaBarbon
Copy link

I am doing something like this and I run into similar error

fs = [lambda x: np.percentile(x, p) for p in ptiles] + [np.sum] off_smry = gb_off['delivery_time'].agg(fs)

Here is the error I get
SpecificationError: Function names must be unique, found multiple named <lambda>

I think it should be allowed to do something like this. In practical scenarios people could be generating multiple lambda functions to apply.

I'm experiencing the same problem

@hp2500
Copy link

hp2500 commented Mar 31, 2019

Same problem here. Push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants