You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After unstacking data that were grouped using groupby the data appears in an unexpected order. The groupby data are in the ascending order, but after unstacking this is no longer true.
The following code demonstrates:
import numpy as np
import pandas as pd
ATM the moment its not easy to sort categoricals; these are being treated as 'strings' right now
because categorical support at the frame level is not fully their yet.
After unstacking data that were grouped using groupby the data appears in an unexpected order. The groupby data are in the ascending order, but after unstacking this is no longer true.
The following code demonstrates:
import numpy as np
import pandas as pd
print pd.version
random data
data = pd.DataFrame()
data['a'] = np.random.random(1000)_10. - 2
data['b'] = np.random.random(1000)_10. - 2
data['c'] = np.random.random(1000)*10. - 2
bins in a and b
ba = np.linspace(-5., 5, 8)
bb = np.linspace(-5., 5, 9)
groupby bins
groups = data.groupby([pd.cut(data['a'], ba), pd.cut(data['b'], bb)])
mask, needs at least 10 to get robust std estimate
msk = groups.count()['c'] > 10
std(c) per bin, these are ordered as expected
stdc = groups['c'].agg([np.std, len])[msk]
print stdc
now unstack
result = stdc['std'].unstack()
and the results are not ordered as expected
print result
and the result is:
b (-1.25, 0] (-2.5, -1.25] (0, 1.25] (1.25, 2.5]
a
(-0.714, 0.714] 2.878018 NaN 3.019754 2.908735
(-2.143, -0.714] 2.490683 NaN 2.809243 2.754959
(0.714, 2.143] 2.656060 2.940749 2.421975 2.677205
(2.143, 3.571] 3.128538 2.879978 3.038488 3.191041
(3.571, 5] 2.534340 2.943076 3.215626 2.687091
b (2.5, 3.75] (3.75, 5]
a
(-0.714, 0.714] 3.328777 3.305938
(-2.143, -0.714] 2.716552 3.090601
(0.714, 2.143] 3.093307 1.669262
(2.143, 3.571] 2.885055 3.295693
(3.571, 5] 3.333355 2.972684
[5 rows x 6 columns]
This seems to happen if the bins cross zero.
The text was updated successfully, but these errors were encountered: