-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Segfault when printing dataframe #46848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! Interestingly I am getting |
A little additional info: After some further testing I've found that the number of columns needed to trigger the segfault depends on the size of the terminal. Expanding my terminal window to full screen allow 33 columns to be print, but increasing the number of columns to 50 again produces a segfault. As @rhshadrach mentioned, Maybe the segfault occurs during the internal methods that create a "nice" representation of the dataframe; if that formatting isn't applied (e.g. I also wonder if this isn't somehow coupled with the state of the dataframe after the columns are updated via for col in df.columns:
df[col] = np.empty_like(df[col]) with df = df.apply(lambda s: np.empty_like(s)) then |
take |
I have a fix that prevents the segfault, but I am not sure how a regression test could be made as the failure depends on the width of the terminal screen. I deem this not a workable regression test, but I am at a loss how to trigger the issue otherwise. |
Yep, definitely weird. Slightly simpler reproducer:
If the fix in #47097 is correct, then it may be something afoot in cython. |
it looks like this underlying issue has been an issue since at least 0.25.3 |
last python code executed pandas/pandas/core/array_algos/take.py Line 163 in c0b659a
|
or maybe numpy? import numpy as np
import pandas as pd
from pandas._libs.algos import take_2d_axis1_object_object
print(pd.__version__)
arr = np.array([[None]], dtype=object)
print(arr)
indexer = np.array([0])
print(indexer)
out = np.array([[None]], dtype=object)
print(out)
take_2d_axis1_object_object(arr, indexer, out, None)
print(out)
arr2 = np.empty_like(arr)
print(arr2)
print(np.array_equal(arr, arr2))
take_2d_axis1_object_object(arr2, indexer, out, None)
print(out)
|
FWIW I don't think that fix is working consistently. It appears to work for the code sample above calling using a code sample to include the ordering logic of arr = np.array([[None]], dtype=object)
indexer = np.array([0])
axis = 1
fill_value = None
allow_fill = False
result = _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
print(result)
arr2 = np.empty_like(arr)
result2 = _take_nd_ndarray(arr2, indexer, axis, fill_value, allow_fill)
print(result2) still segfaults with the patch. |
Maybe related: #13717 from pygeos/pygeos#374
|
Indeed, this time the issue pops up in |
Can this issue be closed now the cython minimum version is bumped? (#47979) |
see #47979 (comment), will close once MacPython/pandas-wheels#188 merged. |
closed by MacPython/pandas-wheels#188 |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I first wanted to put out a thank you to all those that work on keeping pandas going! All of your contributions have helped so many, and I appreciate it.
When running the example code, a segfault occurs at the last line:
print(df)
. I'm sure this wouldn't be the recommended approach to doing this, but I wanted to give a minimal example showing the issues that I had come across. Also, note that replacingdf[col] = np.empty_like(df[col])
withdf[col] = np.empty(len(df), dtype = df[col].dtype)
prevents the issue from arising. But, based on the output of faulthandler (see below), I opted to submit it to the pandas team.Current thread 0x00007fabaf0ba180 (most recent call first):
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/array_algos/take.py", line 163 in _take_nd_ndarray
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/array_algos/take.py", line 117 in take_nd
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 1137 in take_nd
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 844 in _slice_take_blocks_ax0
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/generic.py", line 3916 in _slice
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/indexing.py", line 1533 in _get_slice_axis
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/indexing.py", line 1497 in _getitem_axis
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/indexing.py", line 827 in _getitem_tuple_same_dim
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/indexing.py", line 1462 in _getitem_tuple
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/indexing.py", line 961 in getitem
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/format.py", line 806 in _truncate_horizontally
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/format.py", line 790 in truncate
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/string.py", line 185 in _fit_strcols_to_terminal_width
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/string.py", line 49 in _get_string_representation
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/string.py", line 25 in to_string
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1128 in to_string
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/frame.py", line 1192 in to_string
File "/home/me/sml-env/lib/python3.10/site-packages/pandas/core/frame.py", line 1011 in repr
File "/home/me/test.py", line 13 in
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json (total: 54)
Expected Behavior
I would expect the dataframe would be able to print after updating the columns.
Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.72-microsoft-standard-WSL2
Version : #1 SMP Wed Oct 28 23:40:43 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: