Fix Segfault (maybe) #31092

WillAyd · 2020-01-17T00:11:57Z

closes Segfault on clean-up with count example from docstrings #21824

I have only been able to reproduce this issue once and can't verify that this works, but I think @jorisvandenbossche has had more luck reproducing so maybe can try

I noticed _as_array showing up in a core dump I was able to get, and this line seems a little suspect so perhaps removing the writeable requirement fixes

jbrockmendel · 2020-01-17T01:33:35Z

pandas/_libs/internals.pyx

@@ -40,8 +40,7 @@ cdef class BlockPlacement:
                self._as_array = arr
                self._has_array = True
        else:
-            # Cython memoryview interface requires ndarray to be writeable.
-            arr = np.require(val, dtype=np.int64, requirements='W')
+            arr = np.require(val, dtype=np.int64)


could do np.asarray here? i had to look up np.require

jorisvandenbossche · 2020-01-17T08:47:37Z

No luck, it's still segfaulting with this branch in the same way.

If I look at the traceback in gdb, I get:

(gdb) bt
#0  0x00007fb8adfe0e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb8adfe2801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fb8ae02b897 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fb8ae03290a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fb8ae039ecc in free () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007fb8abff6d10 in PyDataMem_FREE () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#6  0x00007fb8abffa4e5 in array_dealloc () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#7  0x00007fb8abffa4a9 in array_dealloc () from /home/joris/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#8  0x0000561982f0ee76 in clear_slots (self=<optimized out>, type=0x561984d76da0) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1095
#9  subtype_dealloc (self=<IntBlock at remote 0x7fb892dd9b30>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1252
#10 0x0000561982e7a797 in tupledealloc (op=0x7fb892e200d0) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/tupleobject.c:246
#11 0x0000561982f0ee76 in clear_slots (self=<optimized out>, type=0x561984d8c090) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1095
#12 subtype_dealloc (self=<BlockManager at remote 0x7fb892dbe750>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1252
#13 0x0000561982e7a608 in dict_dealloc (mp=0x7fb892e2c960) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/dictobject.c:1905
#14 0x0000561982f0eca1 in subtype_dealloc (self=<DataFrame(_is_copy=None, _data=<BlockManager at remote 0x7fb892dbe750>, _item_cache={}, _attrs={}) at remote 0x7fb892dc2950>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/typeobject.c:1263
#15 0x0000561982e69c98 in frame_dealloc (f=Frame 0x7fb892dcb9b0, for file /home/joris/scipy/pandas/pandas/tests/test_compat.py, line 8, in test_segfault ())
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/frameobject.c:470
#16 0x0000561982e82652 in function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=0x7fb892e4f390)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:291
#17 _PyFunction_FastCallDict (func=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwargs=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:322
#18 0x0000561982f3a57d in do_call_core (kwdict={}, callargs=(), func=<function at remote 0x7fb892e4f4d0>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:4645
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:3191
#20 0x0000561982e81e78 in _PyEval_EvalCodeWithName (_co=<code at remote 0x7fb8ac779a50>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='pytest_pyfunc_call', qualname='pytest_pyfunc_call')
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Python/ceval.c:3930
#21 0x0000561982e826e5 in _PyFunction_FastCallDict (func=<optimized out>, args=0x7fb892e323a8, nargs=1, kwargs=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python_1577402753923/work/Objects/call.c:376
#22 0x0000561982f3a57d in do_call_core (kwdict=0x0, 
    callargs=(<Function(name='test_segfault', parent=<Module(fspath=<LocalPath(strpath='/home/joris/scipy/pandas/pandas/tests/test_compat.py') ...

The only thing that I can make of it is that the problem happens in deallocation of the DataFrame

jorisvandenbossche · 2020-01-17T08:49:16Z

A potential indirect pointer might be the second case I showed here: #21824 (comment)

That example (with selecting the column instead of calling the method on the DataFrame), doesn't crash but gives an incorrect result. Maybe fixing that could indirectly fix the segfault as well, who knows (very uncertain of course :))

jorisvandenbossche · 2020-01-17T08:51:43Z

Hmm, actually that second example might be correct after all. count is supposed to ignore NaNs, but that's for the values it is counting (so column "C"), not for the column on which is grouped.

jorisvandenbossche · 2020-01-17T13:31:50Z

BTW, I had this branch checked out and then did some Arrow work, and got a failure in an arrow test about an array not being writable (pyarrow might do some funky business with block placement, didn't investigate further)

jbrockmendel · 2020-01-17T15:36:11Z

Do we have a non-onerous way to reproduce this? i.e. can i help troubleshoot this?

jorisvandenbossche · 2020-01-17T15:39:40Z

@jbrockmendel do you get a segfault with the example?

WillAyd · 2020-01-17T15:52:27Z

@jorisvandenbossche can you inspect frame 5 in your core dump to see what variable it is trying to free?

jorisvandenbossche · 2020-01-17T16:03:23Z

I don't have much experience with gdb. Do you have a pointer on how I can do that?

WillAyd · 2020-01-17T16:08:17Z

Sure - assuming you already have it loaded just do f 5 to see what is going on in the fifth frame.

You can also use up and down to navigate the frames

https://sourceware.org/gdb/current/onlinedocs/gdb/Selection.html#Selection

jbrockmendel · 2020-01-17T19:51:31Z

do you get a segfault with the example?

On OSX+py37 I get the segfault on the .count(level="A") call, but only intermittently. On Ubuntu+py36 I consistently get the segfault on exit()

jbrockmendel · 2020-01-17T20:03:36Z

Stepping through the call to .count("A") in the interpreter, I only get the segfault-on-exit after the counts = lib.count_level_2d(mask, level_codes, len(level_index), axis=0) call

jbrockmendel · 2020-01-17T20:16:49Z

disabling the @cython.boundscheck(False) and @cython.wraparound(False) in lib.count_level_2d got rid of the segfault, but I was expecting it to raise an IndexError or something, which it didnt do

jbrockmendel · 2020-01-17T20:20:01Z

there it is: disabling the @cython.boundscheck(False) but keeping the @cython.wraparound(False) we get an IndexError:

>>> df = pd.DataFrame({"A": ["a", "b", None, "a", "b"], "B": 1, "C": range(5)}) 
>>> df2 = df.set_index(["A", "B"])
>>> res = df2.count(level="A")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7812, in count
    return self._count_level(level, axis=axis, numeric_only=numeric_only)
  File "pandas/core/frame.py", line 7868, in _count_level
    counts = lib.count_level_2d(mask, level_codes, len(level_index), axis=0)
  File "pandas/_libs/lib.pyx", line 745, in pandas._libs.lib.count_level_2d
    counts[labels[i], j] += mask[i, j]
IndexError: Out of bounds on buffer access (axis 0)

WillAyd · 2020-01-17T20:57:56Z

@jbrockmendel I think that's a great find. What is the state of those variables at that point in time?

jbrockmendel · 2020-01-17T23:01:20Z

What is the state of those variables at that point in time?

Looks like labels[i] = -1 when that IndexError is raised

jorisvandenbossche · 2020-01-18T08:28:27Z

Cool! ;)

So I printed the variables in the cython function, and you get:

In [1]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"]).count(level="A") 
mask:  [[ True]
 [ True]
 [ True]
 [ True]
 [ True]]
labels:  [ 0  1 -1  0  1]
max_bin:  2
axis:  0
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
...
~/scipy/pandas/pandas/_libs/lib.pyx in pandas._libs.lib.count_level_2d()

IndexError: Out of bounds on buffer access (axis 0)

So in this code:

pandas/pandas/_libs/lib.pyx

Lines 738 to 745 in a890239

    
           n, k = (<object>mask).shape 
        
           if axis == 0: 
        
               counts = np.zeros((max_bin, k), dtype='i8') 
        
               with nogil: 
        
                   for i in range(n): 
        
                       for j in range(k): 
        
                           counts[labels[i], j] += mask[i, j]

it's indeed getting a -1 index out of the labels (indicating the missing value in the data), but since the function is decorated with @cython.wrapsaround(False) you get undefined behavior there).

So it's clear that the count_level_2d is not written to handle missing values marked by -1 in the labels (and the labels are here the values in the index level that was specified with .count(level=..)).

Now for a solution, I think we first need to decide on the behavior that we want. Should the "NaN" in the index level be present in the output as well?
So basically should the result be:

   B
A   
a  2
b  2

or

      B
A
a     2
b     2
NaN   1

If it is the first, then we need to exclude those -1's from the labels before passing it to count_level_2d. If it is the second, we need to fix count_level_2d to also handle -1 labels (or turn those in a positive number before passing to count_level_2d).

In general, in groupby, we don't include NaNs as a group in the output.
Also here, if we do eg a 'sum' instead of 'count" using level, the NaN is not included in the output:

In [5]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"]).sum(level="A")   

In [6]: res   
Out[6]: 
   C
A   
a  3
b  5

So it might be more consistent to exclude the NaNs from the groups. In that case, we also have to fix the Series case though, as this also includes NaN (but without the segfault):

In [9]: df = pd.DataFrame({"A": ["a", "b", np.nan, "a", "b"], 
   ...:                    "B": 1, 
   ...:                    "C": range(5)}) 
   ...:                    
   ...: res = df.set_index(["A", "B"])['C'].count(level="A")

In [10]: res
Out[10]: 
A
a      2
b      2
NaN    1
Name: C, dtype: int64

WillAyd · 2020-01-18T16:18:57Z

I think should exclude NaN for consistency

WillAyd · 2020-01-23T17:37:36Z

Short on time so closing for now, but hope to come back to this in the future

try fix

d26a901

jbrockmendel reviewed Jan 17, 2020

View reviewed changes

gfyoung added Bug Segfault Non-Recoverable Error labels Jan 18, 2020

WillAyd closed this Jan 23, 2020

WillAyd deleted the segfault-fix branch April 12, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Segfault (maybe) #31092

Fix Segfault (maybe) #31092

WillAyd commented Jan 17, 2020

jbrockmendel Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020 •

edited

Loading

jbrockmendel commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jorisvandenbossche commented Jan 18, 2020

WillAyd commented Jan 18, 2020

WillAyd commented Jan 23, 2020

Fix Segfault (maybe) #31092

Fix Segfault (maybe) #31092

Conversation

WillAyd commented Jan 17, 2020

jbrockmendel Jan 17, 2020

Choose a reason for hiding this comment

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jorisvandenbossche commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020 • edited Loading

jbrockmendel commented Jan 17, 2020

WillAyd commented Jan 17, 2020

jbrockmendel commented Jan 17, 2020

jorisvandenbossche commented Jan 18, 2020

WillAyd commented Jan 18, 2020

WillAyd commented Jan 23, 2020

jbrockmendel commented Jan 17, 2020 •

edited

Loading