Skip to content

Fix Segfault (maybe) #31092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fix Segfault (maybe) #31092

wants to merge 1 commit into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jan 17, 2020

I have only been able to reproduce this issue once and can't verify that this works, but I think @jorisvandenbossche has had more luck reproducing so maybe can try

I noticed _as_array showing up in a core dump I was able to get, and this line seems a little suspect so perhaps removing the writeable requirement fixes

@@ -40,8 +40,7 @@ cdef class BlockPlacement:
self._as_array = arr
self._has_array = True
else:
# Cython memoryview interface requires ndarray to be writeable.
arr = np.require(val, dtype=np.int64, requirements='W')
arr = np.require(val, dtype=np.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do np.asarray here? i had to look up np.require

@jorisvandenbossche
Copy link
Member

No luck, it's still segfaulting with this branch in the same way.

If I look at the traceback in gdb, I get:

(gdb) bt
#0  0x00007fb8adfe0e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb8adfe2801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fb8ae02b897 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fb8ae03290a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fb8ae039ecc in free () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007fb8abff6d10 in PyDataMem_FREE () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#6  0x00007fb8abffa4e5 in array_dealloc () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#7  0x00007fb8abffa4a9 in array_dealloc () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#8  0x0000561982f0ee76 in clear_slots (self=<optimized out>, type=0x561984d76da0) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1095
#9  subtype_dealloc (self=<IntBlock at remote 0x7fb892dd9b30>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1252
#10 0x0000561982e7a797 in tupledealloc (op=0x7fb892e200d0) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/tupleobject.c:246
#11 0x0000561982f0ee76 in clear_slots (self=<optimized out>, type=0x561984d8c090) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1095
#12 subtype_dealloc (self=<BlockManager at remote 0x7fb892dbe750>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1252
#13 0x0000561982e7a608 in dict_dealloc (mp=0x7fb892e2c960) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/dictobject.c:1905
#14 0x0000561982f0eca1 in subtype_dealloc (self=<DataFrame(_is_copy=None, _data=<BlockManager at remote 0x7fb892dbe750>, _item_cache={}, _attrs={}) at remote 0x7fb892dc2950>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1263
#15 0x0000561982e69c98 in frame_dealloc (f=Frame 0x7fb892dcb9b0, for file /home/joris/scipy/pandas/pandas/tests/test_compat.py, line 8, in test_segfault ())
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/frameobject.c:470
#16 0x0000561982e82652 in function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=0x7fb892e4f390)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:291
#17 _PyFunction_FastCallDict (func=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwargs=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:322
#18 0x0000561982f3a57d in do_call_core (kwdict={}, callargs=(), func=<function at remote 0x7fb892e4f4d0>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:4645
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:3191
#20 0x0000561982e81e78 in _PyEval_EvalCodeWithName (_co=<code at remote 0x7fb8ac779a50>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='pytest_pyfunc_call', qualname='pytest_pyfunc_call')
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:3930
#21 0x0000561982e826e5 in _PyFunction_FastCallDict (func=<optimized out>, args=0x7fb892e323a8, nargs=1, kwargs=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:376
#22 0x0000561982f3a57d in do_call_core (kwdict=0x0, 
    callargs=(<Function(name='test_segfault', parent=<Module(fspath=<LocalPath(strpath='/home/joris/scipy/pandas/pandas/tests/test_compat.py') ...

The only thing that I can make of it is that the problem happens in deallocation of the DataFrame

@jorisvandenbossche
Copy link
Member

A potential indirect pointer might be the second case I showed here: #21824 (comment)

That example (with selecting the column instead of calling the method on the DataFrame), doesn't crash but gives an incorrect result. Maybe fixing that could indirectly fix the segfault as well, who knows (very uncertain of course :))

@jorisvandenbossche
Copy link
Member

Hmm, actually that second example might be correct after all. count is supposed to ignore NaNs, but that's for the values it is counting (so column "C"), not for the column on which is grouped.

@jorisvandenbossche
Copy link
Member

BTW, I had this branch checked out and then did some Arrow work, and got a failure in an arrow test about an array not being writable (pyarrow might do some funky business with block placement, didn't investigate further)

@jbrockmendel
Copy link
Member

Do we have a non-onerous way to reproduce this? i.e. can i help troubleshoot this?

@jorisvandenbossche
Copy link
Member

@jbrockmendel do you get a segfault with the example?

@WillAyd
Copy link
Member Author

WillAyd commented Jan 17, 2020

@jorisvandenbossche can you inspect frame 5 in your core dump to see what variable it is trying to free?

@jorisvandenbossche
Copy link
Member

I don't have much experience with gdb. Do you have a pointer on how I can do that?

@WillAyd
Copy link
Member Author

WillAyd commented Jan 17, 2020

Sure - assuming you already have it loaded just do f 5 to see what is going on in the fifth frame.

You can also use up and down to navigate the frames

https://sourceware.org/gdb/current/onlinedocs/gdb/Selection.html#Selection

@jbrockmendel
Copy link
Member

do you get a segfault with the example?

On OSX+py37 I get the segfault on the .count(level="A") call, but only intermittently. On Ubuntu+py36 I consistently get the segfault on exit()

@jbrockmendel
Copy link
Member

Stepping through the call to .count("A") in the interpreter, I only get the segfault-on-exit after the counts = lib.count_level_2d(mask, level_codes, len(level_index), axis=0) call

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 17, 2020

disabling the @cython.boundscheck(False) and @cython.wraparound(False) in lib.count_level_2d got rid of the segfault, but I was expecting it to raise an IndexError or something, which it didnt do

@jbrockmendel
Copy link
Member

there it is: disabling the @cython.boundscheck(False) but keeping the @cython.wraparound(False) we get an IndexError:

>>> df = pd.DataFrame({"A": ["a", "b", None, "a", "b"], "B": 1, "C": range(5)}) 
>>> df2 = df.set_index(["A", "B"])
>>> res = df2.count(level="A")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7812, in count
    return self._count_level(level, axis=axis, numeric_only=numeric_only)
  File "pandas/core/frame.py", line 7868, in _count_level
    counts = lib.count_level_2d(mask, level_codes, len(level_index), axis=0)
  File "pandas/_libs/lib.pyx", line 745, in pandas._libs.lib.count_level_2d
    counts[labels[i], j] += mask[i, j]
IndexError: Out of bounds on buffer access (axis 0)

@WillAyd
Copy link
Member Author

WillAyd commented Jan 17, 2020

@jbrockmendel I think that's a great find. What is the state of those variables at that point in time?

@jbrockmendel
Copy link
Member

What is the state of those variables at that point in time?

Looks like labels[i] = -1 when that IndexError is raised

@gfyoung gfyoung added Bug Segfault Non-Recoverable Error labels Jan 18, 2020
@jorisvandenbossche
Copy link
Member

Cool! ;)

So I printed the variables in the cython function, and you get:

In [1]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"]).count(level="A") 
mask:  [[ True]
 [ True]
 [ True]
 [ True]
 [ True]]
labels:  [ 0  1 -1  0  1]
max_bin:  2
axis:  0
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
...
~/scipy/pandas/pandas/_libs/lib.pyx in pandas._libs.lib.count_level_2d()

IndexError: Out of bounds on buffer access (axis 0)

So in this code:

pandas/pandas/_libs/lib.pyx

Lines 738 to 745 in a890239

n, k = (<object>mask).shape
if axis == 0:
counts = np.zeros((max_bin, k), dtype='i8')
with nogil:
for i in range(n):
for j in range(k):
counts[labels[i], j] += mask[i, j]

it's indeed getting a -1 index out of the labels (indicating the missing value in the data), but since the function is decorated with @cython.wrapsaround(False) you get undefined behavior there).

So it's clear that the count_level_2d is not written to handle missing values marked by -1 in the labels (and the labels are here the values in the index level that was specified with .count(level=..)).


Now for a solution, I think we first need to decide on the behavior that we want. Should the "NaN" in the index level be present in the output as well?
So basically should the result be:

   B
A   
a  2
b  2

or

      B
A
a     2
b     2
NaN   1

If it is the first, then we need to exclude those -1's from the labels before passing it to count_level_2d. If it is the second, we need to fix count_level_2d to also handle -1 labels (or turn those in a positive number before passing to count_level_2d).

In general, in groupby, we don't include NaNs as a group in the output.
Also here, if we do eg a 'sum' instead of 'count" using level, the NaN is not included in the output:

In [5]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"]).sum(level="A")   

In [6]: res   
Out[6]: 
   C
A   
a  3
b  5

So it might be more consistent to exclude the NaNs from the groups. In that case, we also have to fix the Series case though, as this also includes NaN (but without the segfault):

In [9]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"])['C'].count(level="A")

In [10]: res
Out[10]: 
A
a      2
b      2
NaN    1
Name: C, dtype: int64

@WillAyd
Copy link
Member Author

WillAyd commented Jan 18, 2020

I think should exclude NaN for consistency

@WillAyd
Copy link
Member Author

WillAyd commented Jan 23, 2020

Short on time so closing for now, but hope to come back to this in the future

@WillAyd WillAyd closed this Jan 23, 2020
@WillAyd WillAyd deleted the segfault-fix branch April 12, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Segfault Non-Recoverable Error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault on clean-up with count example from docstrings
4 participants