-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: grouping with categorical interval columns #34164
Comments
This seems to work on master for me: [ins] In [1]: import numpy as np
[ins] In [2]: import pandas as pd
[ins] In [3]: pd.set_option("use_inf_as_na",True)
[ins] In [4]: t = pd.DataFrame({"x":np.random.randn(100), 'w':np.random.choice(list("ABC"), 100)})
[ins] In [5]: qq = pd.qcut(t['x'], q=np.linspace(0,1,5))
[ins] In [6]: t.groupby([qq,'w'])['x'].agg('mean')
Out[6]:
x w
(-2.7649999999999997, -0.736] A -1.412247
B -1.319972
C -1.108550
(-0.736, -0.114] A -0.351454
B -0.388151
C -0.404094
(-0.114, 0.587] A 0.134442
B 0.235705
C 0.406392
(0.587, 1.864] A 1.056471
B 0.973123
C 1.189502
Name: x, dtype: float64 |
Here is my env:
|
would take a regression test to close |
Do you mean writing a regression test for the correct behavior like this one, to make sure the behavior stays correct ? pandas/pandas/tests/groupby/test_groupby.py Lines 108 to 147 in dbc3afa
|
yes |
Noted, can I take on this issue? |
After tweaking around on notebook, it seems like I could replicate the problem with the condition:
|
I've added a test for this older bug which, as stated above, was already fixed a while back. |
Looks like Closed PR#52818 should have closed this task. Pinging @phofl for visibility. |
Hello, |
Versions:
pandas 1.0.3
numpy 1.18.1
There is a bug in the 1.XXX pandas release that does not allow you to group by a categorical interval index column together with another column.
This works and gives the expected result:
t.groupby([qq])['x'].agg('mean')
x (-10.001, -1.0] -1.431893 (-1.0, 0.0] -0.423564 (0.0, 1.0] 0.461174 (1.0, 10.0] 1.662297 Name: x, dtype: float64
This raises a TypeError:
t.groupby([qq,'w'])['x'].agg('mean')
The text was updated successfully, but these errors were encountered: