Description
After unstacking data that were grouped using groupby the data appears in an unexpected order. The groupby data are in the ascending order, but after unstacking this is no longer true.
The following code demonstrates:
import numpy as np
import pandas as pd
print pd.version
random data
data = pd.DataFrame()
data['a'] = np.random.random(1000)_10. - 2
data['b'] = np.random.random(1000)_10. - 2
data['c'] = np.random.random(1000)*10. - 2
bins in a and b
ba = np.linspace(-5., 5, 8)
bb = np.linspace(-5., 5, 9)
groupby bins
groups = data.groupby([pd.cut(data['a'], ba), pd.cut(data['b'], bb)])
mask, needs at least 10 to get robust std estimate
msk = groups.count()['c'] > 10
std(c) per bin, these are ordered as expected
stdc = groups['c'].agg([np.std, len])[msk]
print stdc
now unstack
result = stdc['std'].unstack()
and the results are not ordered as expected
print result
and the result is:
b (-1.25, 0] (-2.5, -1.25] (0, 1.25] (1.25, 2.5]
a
(-0.714, 0.714] 2.878018 NaN 3.019754 2.908735
(-2.143, -0.714] 2.490683 NaN 2.809243 2.754959
(0.714, 2.143] 2.656060 2.940749 2.421975 2.677205
(2.143, 3.571] 3.128538 2.879978 3.038488 3.191041
(3.571, 5] 2.534340 2.943076 3.215626 2.687091
b (2.5, 3.75] (3.75, 5]
a
(-0.714, 0.714] 3.328777 3.305938
(-2.143, -0.714] 2.716552 3.090601
(0.714, 2.143] 3.093307 1.669262
(2.143, 3.571] 2.885055 3.295693
(3.571, 5] 3.333355 2.972684
[5 rows x 6 columns]
This seems to happen if the bins cross zero.