Skip to content

DataFrame.unstack() fails when some index column values are NaN #4862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brianboonstra opened this issue Sep 17, 2013 · 2 comments · Fixed by #9292
Closed

DataFrame.unstack() fails when some index column values are NaN #4862

brianboonstra opened this issue Sep 17, 2013 · 2 comments · Fixed by #9292
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@brianboonstra
Copy link

{Python 2.6.6, pandas 0.12}

A DataFrame will fail to unstack() when one of the columns retained as an index has NaN values. The code below sets up a dataframe with NaN in some index entries, at which point calling unstack() will fail.

In the first failure, the exception message is that the index "has duplicate entries" which is patently false. In the second failure, where a given id only has one NaN, the error message becomes cannot convert float NaN to integer.

A final try, with NaN converted to a sentinel value of 42, shows proper behavior.

import pandas
from numpy import nan

df = pandas.DataFrame(
    {'agent': {
                      17263: 'Hg',
                      17264: 'U',
                      17265: 'Pb',
                      17266: 'Sn',
                      17267: 'Ag',
                      17268: 'Hg'},
    'change': {
                      17263: nan,
                      17264: 0.0,
                      17265: 7.070e-06,
                      17266: 2.3614e-05,
                      17267: 0.0,
                      17268: -0.00015},
    'dosage': {
                      17263: nan,
                      17264: nan,
                      17265: nan,
                      17266: 0.0133,
                      17267: 0.0133,
                      17268: 0.0133},
    's_id': {
                      17263: 680585148,
                      17264: 680585148,
                      17265: 680585148,
                      17266: 680607017,
                      17267: 680607017,
                      17268: 680607017}}
            )
try:
    dupe = df.copy().set_index(['s_id','dosage','agent'])
    badDupe = dupe.unstack()
except Exception as e:
    print( 'Error with all data was: %s'%(e,) )
try:
    getnan = df.ix[17264:].copy().set_index(['s_id','dosage','agent'])
    badNan = getnan.unstack()
except Exception as e:
    print( 'Error dropping first entry was: %s'%(e,) )
df.dosage[:3]=42
willWork = df.copy().set_index(['s_id','dosage','agent'])
u = willWork.unstack()
print(u) 

Overall output:

Error with all data was: Index contains duplicate entries, cannot reshape
Error dropping first entry was: cannot convert float NaN to integer

                   change                                 
agent                  Ag       Hg        Pb        Sn   U
s_id      dosage                                          
680585148 42.0000     NaN      NaN  0.000007       NaN   0
680607017 0.0133        0 -0.00015       NaN  0.000024 NaN
@hayd
Copy link
Contributor

hayd commented Sep 17, 2013

In [12]: df1
Out[12]:
                          change
s_id      dosage agent
680585148 NaN    Hg          NaN
                 U      0.000000
                 Pb     0.000007
680607017 0.0133 Sn     0.000024
                 Ag     0.000000
                 Hg    -0.000150

In [13][: df1.unstack()
# ValueError: Index contains duplicate entries, cannot reshape

guessing to do with:

In [14]: df1.index.levels[1]
Out[14]: Index([0.0133], dtype=object)

In [15]: df1.index.labels
Out[15]: FrozenList([[0, 0, 0, 1, 1, 1], [-1, -1, -1, 0, 0, 0], [1, 4, 2, 3, 0, 1]])

works when not NaN

In [21]: df2
Out[21]:
                           change
s_id      dosage  agent
680585148 42.0000 Hg          NaN
                  U      0.000000
                  Pb     0.000007
680607017 0.0133  Sn     0.000024
                  Ag     0.000000
                  Hg    -0.000150

In [22]: df2.unstack()
Out[22]:
                   change
agent                  Ag       Hg        Pb        Sn   U
s_id      dosage
680585148 42.0000     NaN      NaN  0.000007       NaN   0
680607017 0.0133        0 -0.00015       NaN  0.000024 NaN

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

first part now works (as now allow single nan in indexes) (and will be in 0.13)

In [2]: df.copy().set_index(['s_id','dosage','agent'])
Out[2]: 
                          change
s_id      dosage agent          
680585148 NaN    Hg          NaN
                 U      0.000000
                 Pb     0.000007
680607017 0.0133 Sn     0.000024
                 Ag     0.000000
                 Hg    -0.000150

In [3]: df
Out[3]: 
      agent    change  dosage       s_id
17263    Hg       NaN     NaN  680585148
17264     U  0.000000     NaN  680585148
17265    Pb  0.000007     NaN  680585148
17266    Sn  0.000024  0.0133  680607017
17267    Ag  0.000000  0.0133  680607017
17268    Hg -0.000150  0.0133  680607017

In [4]: df.unstack()
Out[4]: 
agent   17263            Hg
        17264             U
        17265            Pb
        17266            Sn
        17267            Ag
        17268            Hg
change  17263           NaN
        17264             0
        17265      7.07e-06
        17266    2.3614e-05
        17267             0
        17268      -0.00015
dosage  17263           NaN
        17264           NaN
        17265           NaN
        17266        0.0133
        17267        0.0133
        17268        0.0133
s_id    17263     680585148
        17264     680585148
        17265     680585148
        17266     680607017
        17267     680607017
        17268     680607017
dtype: object

2nd part failing (but I think an easy fix)


In [5]: df.ix[17264:].copy().set_index(['s_id','dosage','agent']).unstack()
ValueError: cannot convert float NaN to integer

pushing to 0.14 for full fixing/validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants