Skip to content

Unstacked data in unexpected order #6565

Closed
@sniemi

Description

@sniemi

After unstacking data that were grouped using groupby the data appears in an unexpected order. The groupby data are in the ascending order, but after unstacking this is no longer true.

The following code demonstrates:
import numpy as np
import pandas as pd

print pd.version

random data

data = pd.DataFrame()
data['a'] = np.random.random(1000)_10. - 2
data['b'] = np.random.random(1000)_10. - 2
data['c'] = np.random.random(1000)*10. - 2

bins in a and b

ba = np.linspace(-5., 5, 8)
bb = np.linspace(-5., 5, 9)

groupby bins

groups = data.groupby([pd.cut(data['a'], ba), pd.cut(data['b'], bb)])

mask, needs at least 10 to get robust std estimate

msk = groups.count()['c'] > 10

std(c) per bin, these are ordered as expected

stdc = groups['c'].agg([np.std, len])[msk]
print stdc

now unstack

result = stdc['std'].unstack()

and the results are not ordered as expected

print result

and the result is:
b (-1.25, 0] (-2.5, -1.25] (0, 1.25] (1.25, 2.5]
a
(-0.714, 0.714] 2.878018 NaN 3.019754 2.908735
(-2.143, -0.714] 2.490683 NaN 2.809243 2.754959
(0.714, 2.143] 2.656060 2.940749 2.421975 2.677205
(2.143, 3.571] 3.128538 2.879978 3.038488 3.191041
(3.571, 5] 2.534340 2.943076 3.215626 2.687091

b (2.5, 3.75] (3.75, 5]
a
(-0.714, 0.714] 3.328777 3.305938
(-2.143, -0.714] 2.716552 3.090601
(0.714, 2.143] 3.093307 1.669262
(2.143, 3.571] 2.885055 3.295693
(3.571, 5] 3.333355 2.972684

[5 rows x 6 columns]

This seems to happen if the bins cross zero.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions