-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: CoW and np.asarray leads to surprising issues #52823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As a side note, enabling CoW on statsmodels leads to ~300 test failures. |
xref numpy/numpy#23625 |
This is intended for now. There was some discussion in another issue. We can reopen if you like. Can you share a stats models build? |
I"m just testing locally after statsmodels/statsmodels#8811 was reported. I am planning on setting up a canary build with CoW enabled, and I'll post that when it is done. |
I'm leaning towards this (also on the front-end) |
@bashtage To understand what was going on in this specific case: the issue was not that you were actually relying on being able to modify those arrays, but rather that you were passing those arrays to cython code that couldn't handle read-only (const) memoryviews (statsmodels/statsmodels#8820). Is that correct? I think the cython issue is going to be a very common one for code passing data from DataFrames to cython algos. |
A lot of them where caes where There are bigger question as to what |
For cases like this, where you know that you created the DataFrame yourself (not passed in by the user) or is result of a calculation that is never a view of original data (an aggregation), it might be nice if we can provide an easier way to get a writeable array. I think the current recommendation would be: if you know you can safely write to the array, "just" change the flag: # ... code that creates a dataframe
df = ...
arr = np.asarray(df)
# I know it is safe to ignore the readonly flag
df.flags.readonly = False
# ... code that modifies `arr` in place On the one hand, this is not super nice, and we could think about some keyword in |
Note that you can already do |
Reproducible Example
Issue Description
This probably isn't a bug but it makes interaction outside of pandas considerably harder.
It seems that it would be good to somewhere have a formal way too ensure that casts to numpy arrays are always writable. Perhaps a
writeable
keyword into_numpy
?Expected Behavior
Maybe leave the CoW behavior behind when moving outside of pandas.
Installed Versions
INSTALLED VERSIONS
commit : 478d340
python : 3.11.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.0.0
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.0.1
Cython : 0.29.33
pytest : 7.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.0.dev0+1817.98c51a6
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: