Skip to content

Inconsistent changes of dtype on assignment to multiindexed columns #18415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
da-woods opened this issue Nov 21, 2017 · 1 comment
Open

Inconsistent changes of dtype on assignment to multiindexed columns #18415

da-woods opened this issue Nov 21, 2017 · 1 comment
Labels
32bit 32-bit systems Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@da-woods
Copy link

da-woods commented Nov 21, 2017

I've found some odd behaviour when assigning to columns with a multiindex. I'm trying to use an array with a float32 dtype, but it's being converted to a float64 dtype under some circumstances. For large arrays this is accompanied by a signifcant slowdown.

>>> import sys; sys.version
sys.version
'3.6.3 (default, Oct 11 2017, 14:49:33) [GCC]'
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> import numpy as np


>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A
     1                        2                    
     0    1    2    3    4    0    1    2    3    4
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

>>> A.loc[:,(1,1)] = np.ones((6,),dtype=np.float32) # index a single column - doesn't change dtypes
>>> (A.dtypes==np.float32).all()
True
>>> A.loc[:,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # Index multiple columns - changes dtypes
>>> (A.dtypes==np.float32).all()
False

So indexing a single column keeps the dtype as float32 (as I would expect), but indexing multiple columns changes it to float64. The behaviour is also different if you write to part of a column (doesn't change) vs a whole column (does change):

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A.loc[2:3,(1,slice(2,3))] = np.ones((2,2),dtype=np.float32) # index a section of multiple columns - doesn’t change dtypes
>>> (A.dtypes==np.float32).all()
True
>>> A.loc[0:5,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # but indexing a complete section does change dtypes
>>> (A.dtypes==np.float32).all()
False

If the multiindex is on axis 0 rather than axis 1 then it does not change the dtypes

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A = A.T
>>> A.loc[(1,slice(2,3)),:] = np.ones((6,2),dtype=np.float32).T # doesn’t change any dtypes
>>> (A.dtypes==np.float32).all()
True

This odd behaviour only applies to multiindexes:

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32))
>>> A.loc[:,2:3] = np.ones((6,2),dtype=np.float32) # does not change dtypes
>>> (A.dtypes==np.float32).all()
True

Finally it also applies to iloc as well as loc:

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A.iloc[:,2:4] = np.ones((6,2),dtype=np.float32) # changes dtypes
>>> (A.dtypes==np.float32).all()
False
@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions MultiIndex 32bit 32-bit systems labels Jan 13, 2019
@mroeschke mroeschke added the Bug label Apr 3, 2020
jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Jan 10, 2022
@jbrockmendel
Copy link
Member

#45290 implements an xfailed test for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
32bit 32-bit systems Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants