Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original numpy array getting altered when changes made to dataframe #14953

Closed
VathsalaAchar opened this issue Dec 22, 2016 · 1 comment
Closed
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Usage Question

Comments

@VathsalaAchar
Copy link

VathsalaAchar commented Dec 22, 2016

Code Sample

a = np.array([1,2,3, np.nan])
b = pd.DataFrame(a)
b.fillna(4, inplace=True)
print b
print a

Output

     0
0  1.0
1  2.0
2  3.0
3  4.0
[ 1.  2.  3.  4.]

Problem description

When a dataframe is created from a numpy array the changes to the dataframe are altering the original numpy array. I did not expect this to happen and I'm not sure if this is an expected behaviour or a known issue.

I do know how to work around this, but my question is whether I have to.

Expected Output

     0
0  1.0
1  2.0
2  3.0
3  4.0
[  1.   2.   3.  nan]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.8-040408-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0

@jreback
Copy link
Contributor

jreback commented Dec 22, 2016

so this is a 'feature', in that view propogation in numpy is a feature. As a user you have to be congnizant of it, and it can make things quite performant. Pandas does not own a passed in numpy array and thus it IS externally visible.

In general, using inplace=True ops are not idiomatic to pandas, virtually all operations return new data (which is copied).

Note that view propogation is only true in some cases: single dtyped, no prior modification, no dtype changes on the op, and non-object types.

In [1]: a = np.array([1,2,3, np.nan])
   ...: b = pd.DataFrame(a)
   ...: b.fillna(4, inplace=True)

# this is the viewed array
In [2]: b.values.base
Out[2]: array([ 1.,  2.,  3.,  4.])

In [3]: a2 = np.array(['a', 'b', 'c'])

# not true for object dtypes
In [4]: b2 = pd.DataFrame(a2)

In [5]: b2.loc[0, 0] = 'foo'

In [6]: b2
Out[6]: 
     0
0  foo
1    b
2    c

In [7]: a2
Out[7]: 
array(['a', 'b', 'c'], 
      dtype='<U1')

This happens to be only in-place in pandas itself and not numpy.

In [8]: a = np.array([1,2,3, np.nan])   ...: 

In [9]: b = pd.DataFrame(a)

In [10]: b +=1 

In [11]: a
Out[11]: array([  1.,   2.,   3.,  nan])

In [13]: b.values.base
Out[13]: array([[  2.,   3.,   4.,  nan]])

@jreback jreback closed this as completed Dec 22, 2016
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Usability Usage Question labels Dec 22, 2016
@jreback jreback added this to the No action milestone Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants