Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Remove with nogil statement to avoid segfault #166

Merged
merged 4 commits into from
Mar 26, 2015

Conversation

mrocklin
Copy link
Contributor

Concurrent reads sometimes trigger a segfault somewhere within blosc_decompress.

In principle the old bcolz code seems correct; presumably there is some issue lower down? For now a cheap solution seems to be to hold on to the GIL.

Full trace is here:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9c7d700 (LWP 28826)]
_int_malloc (av=0x7fffa4000020, bytes=568) at malloc.c:3489
3489    malloc.c: No such file or directory.
(gdb) bt
#0  _int_malloc (av=0x7fffa4000020, bytes=568) at malloc.c:3489
#1  0x00007ffff70907b0 in __GI___libc_malloc (bytes=568) at malloc.c:2891
#2  0x00007ffff707c44d in __fopen_internal (
    filename=0x7fffa4031e90 "/home/mrocklin/data/trip3.bcolz/rate_code/data/__445.blp", 
    mode=0x7fff3fc8bf80 "rb", is32=1) at iofopen.c:73
#3  0x00007ffff7a66717 in open_the_file (f=0x7fffdc338e40, 
    name=0x7fffa4031e90 "/home/mrocklin/data/trip3.bcolz/rate_code/data/__445.blp", 
    mode=0x7ffff7ecddc4 "rb") at Objects/fileobject.c:374
#4  0x00007ffff7a68b52 in file_init (self=0x7fffdc338e40, args=0x7fffa0776098, kwds=0x0)
    at Objects/fileobject.c:2430
#5  0x00007ffff7aa3208 in type_call (type=<optimized out>, args=0x7fffa0776098, kwds=0x0)
    at Objects/typeobject.c:745
#6  0x00007ffff7a40323 in PyObject_Call (func=0x7ffff7d9c820 <PyFile_Type>, arg=<optimized out>, 
    kw=<optimized out>) at Objects/abstract.c:2529
#7  0x00007fffe1d7dac1 in __Pyx_PyObject_Call (func=0x7ffff7fcda70, arg=arg@entry=0x7fffa0776098, 
    kw=0x0) at bcolz/carray_ext.c:37718
#8  0x00007fffe1dc70b6 in __pyx_f_5bcolz_10carray_ext_6chunks_read_chunk (
    __pyx_v_self=<optimized out>, __pyx_v_nchunk=0x11fce10) at bcolz/carray_ext.c:9434
#9  0x00007fffe1d8532d in __pyx_pf_5bcolz_10carray_ext_6chunks_2__getitem__ (
    __pyx_v_nchunk=0x11fce10, __pyx_v_self=0x7fffe1b40ad0) at bcolz/carray_ext.c:9867
#10 __pyx_pw_5bcolz_10carray_ext_6chunks_3__getitem__ (__pyx_v_self=0x7fffe1b40ad0, 
    __pyx_v_nchunk=0x11fce10) at bcolz/carray_ext.c:9809
#11 0x00007fffe1d7a07a in __pyx_sq_item_5bcolz_10carray_ext_chunks (o=0x7fffe1b40ad0, 
    i=<optimized out>) at bcolz/carray_ext.c:34563
#12 0x00007fffe1dba1d0 in __Pyx_GetItemInt_Fast (is_list=0, wraparound=1, boundscheck=1, i=445, 
    o=<optimized out>) at bcolz/carray_ext.c:38415
#13 __pyx_pf_5bcolz_10carray_ext_6carray_38__getitem__ (__pyx_v_key=<optimized out>, 
    __pyx_v_self=<optimized out>) at bcolz/carray_ext.c:24885
#14 __pyx_pw_5bcolz_10carray_ext_6carray_39__getitem__ (__pyx_v_self=<optimized out>, 
    __pyx_v_key=<optimized out>) at bcolz/carray_ext.c:23217
#15 0x00007fffe1db8ab3 in __pyx_pf_5bcolz_10carray_ext_6carray_38__getitem__ (
    __pyx_v_key=0x7fffdb406d10, __pyx_v_self=0x7fffe2969bd0) at bcolz/carray_ext.c:23668
#16 __pyx_pw_5bcolz_10carray_ext_6carray_39__getitem__ (__pyx_v_self=0x7fffe2969bd0, 
    __pyx_v_key=0x7fffdb406d10) at bcolz/carray_ext.c:23217
#17 0x00007ffff54a3b3d in op_getitem (s=<optimized out>, a=<optimized out>)
    at -------src-dir-------/Python-2.7.9/Modules/operator.c:130
#18 0x00007ffff7af022f in ext_do_call (nk=-1742972520, na=<optimized out>, flags=<optimized out>, 
    pp_stack=0x7fffd9c7b408, func=0x7ffff5f09f38) at Python/ceval.c:4343
#19 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2718
#20 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffc4004910, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#21 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7b608, func=0x7fffdc32b050) at Python/ceval.c:4129
#22 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7b608) at Python/ceval.c:4054
#23 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#24 0x00007ffff7a6421c in gen_send_ex (gen=0x7fffdb3f8820, arg=0x0, exc=<optimized out>)
    at Objects/genobject.c:85
#25 0x00007ffff7a3f62b in PyIter_Next (iter=<optimized out>) at Objects/abstract.c:3103
#26 0x00007ffff7ae44da in builtin_zip (self=<optimized out>, args=<optimized out>)
    at Python/bltinmodule.c:2555
#27 0x00007ffff7af022f in ext_do_call (nk=-1603221672, na=<optimized out>, flags=<optimized out>, 
    pp_stack=0x7fffd9c7b7f8, func=0x7ffff7fcdea8) at Python/ceval.c:4343
---Type <return> to continue, or q <return> to quit---
#28 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2718
#29 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffa40398f0, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#30 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7b9f8, func=0x7fffdc32b050) at Python/ceval.c:4129
#31 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7b9f8) at Python/ceval.c:4054
#32 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#33 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffc0002c90, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#34 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7bbf8, func=0x7fffdc32b050) at Python/ceval.c:4129
#35 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7bbf8) at Python/ceval.c:4054
#36 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#37 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffa4003750, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#38 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7bdf8, func=0x7fffdc32b050) at Python/ceval.c:4129
#39 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7bdf8) at Python/ceval.c:4054
#40 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#41 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffb4064210, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#42 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7bff8, func=0x7fffdc32b050) at Python/ceval.c:4129
#43 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7bff8) at Python/ceval.c:4054
#44 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#45 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b27b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=2, kws=0x7fffa4003968, kwcount=0, 
    defs=0x7fffdc3b3a28, defcount=1, closure=0x0) at Python/ceval.c:3265
#46 0x00007ffff7af02aa in fast_function (nk=<optimized out>, na=2, n=<optimized out>, 
    pp_stack=0x7fffd9c7c1f8, func=0x7fffdc32b050) at Python/ceval.c:4129
#47 call_function (oparg=<optimized out>, pp_stack=0x7fffd9c7c1f8) at Python/ceval.c:4054
#48 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2679
#49 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdc3b28b0, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=5, kws=0x7ffff7f92068, kwcount=0, 
    defs=0x7fffdc3b3b68, defcount=1, closure=0x0) at Python/ceval.c:3265
#50 0x00007ffff7a6f958 in function_call (func=0x7fffdc32b0c8, arg=0x7fffdbcae4d0, kw=0x7fffdbed2910)
    at Objects/funcobject.c:526
#51 0x00007ffff7a40323 in PyObject_Call (func=0x7fffdc32b0c8, arg=<optimized out>, 
    kw=<optimized out>) at Objects/abstract.c:2529
#52 0x00007ffff7aee865 in ext_do_call (nk=-607460144, na=<optimized out>, flags=<optimized out>, 
    pp_stack=0x7fffd9c7c4c8, func=0x7fffdc32b0c8) at Python/ceval.c:4346
#53 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2718
#54 0x00007ffff7af1c6e in PyEval_EvalCodeEx (co=0x7fffdbece830, globals=<optimized out>, 
    locals=<optimized out>, args=<optimized out>, argcount=5, kws=0x7ffff7f92068, kwcount=0, 
    defs=0x7fffdc349f68, defcount=3, closure=0x0) at Python/ceval.c:3265
#55 0x00007ffff7a6f958 in function_call (func=0x7fffdbed3b90, arg=0x7fffdbcae050, kw=0x7fffdbee15c8)
    at Objects/funcobject.c:526
#56 0x00007ffff7a40323 in PyObject_Call (func=0x7fffdbed3b90, arg=<optimized out>, 
    kw=<optimized out>) at Objects/abstract.c:2529

Relevant line numbers often point to this carray_ext.c file

@FrancescAlted
Copy link
Member

I assume that you are experiencing this using latest c-blosc (1.5 series). I don't think keeping the GIL would be the best approach, and I'd rather lean to use a specific call that will restrict bcolz using one single thread in c-blosc when in multithreading operation. How to implement this is open to discussion.

@mrocklin
Copy link
Contributor Author

Oddly I'm at blosc 1.4.1

Blosc version:     1.4.1 ($Date:: 2014-07-08 #$)

It's also quite possible that something is wrong upstream with my code which uses bcolz. I believe I'm handling everything safely but I could be wrong.

I agree that it would be better to learn what is the root cause and fix that. My thought here was that I'm not yet likely to dive down into the blosc codebase and my understanding is that you all are fairly busy. Holding the GIL seemed like a cheap temporary solution.

@FrancescAlted
Copy link
Member

Well, having the problem with blosc 1.4.1 is weird indeed. Could you please post a minimal self-contained example reproducing the issue?

At any rate, if it is confirmed that bcolz with blosc 1.4.1 is actually having a problem with multithreading, well, then it might be that your PR is the fastest way to get rid of the problem (at the cost of some performance).

@mrocklin
Copy link
Contributor Author

Could you please post a minimal self-contained example reproducing the issue?

That is a very reasonable request but also fairly difficult. My bcolz code is deep within a concurrent program and this only seems to occur in somewhat heavy workloads. I'll poke around a bit though and see if I can reduce things.

@mrocklin
Copy link
Contributor Author

While trying to isolate this I'm getting more fun error messages like these; not sure if they're useful.

*** Error in `python': corrupted double-linked list: 0x00007f2404df6c40 ***
Aborted (core dumped)

*** Error in `python': free(): corrupted unsorted chunks: 0x00007f0bcf787290 ***
Aborted (core dumped)

@FrancescAlted
Copy link
Member

Well, if they disappear when holding the GIL that means that Blosc was not safe to use by multithreaded apps. Please double check that bcolz.blosc_set_nthreads(1) can make the errors disappear.

@mrocklin
Copy link
Contributor Author

Setting bcolz.blosc_set_nthreads(1) does not make the errors disappear.

This doesn't seem to be a case of multiple threads within blosc causing an issue, but rather multiple threads calling bcolz functions calling blosc functions.

I haven't yet been able to reproduce this in an isolated setting.

@esc
Copy link
Member

esc commented Mar 24, 2015

I am not sure that 1.4.1 is actually thread safe since it uses a global state. Not releasing the GIL on the other hand will impact performance significantly. I think this is a blocker for the time being until the 1.5.x can be fixed. This includes support for Blosc contexts and should be thread safe, IIRC.

@mrocklin
Copy link
Contributor Author

Not releasing the GIL on the other hand will impact performance significantly

In my particular use case this doesn't affect my normal workflows. Do you have many users who do heavy concurrent reads on bcolz arrays? Given that blosc itself can be multithreaded it seems like there wouldn't be much performance benefit to releasing the GIL at this stage.

@mrocklin
Copy link
Contributor Author

I can also operate from a fork until blosc 1.5.x is fixed.

@mrocklin
Copy link
Contributor Author

FWIW this post provides my motivation for this work and hopefully adds a bit of visibility to bcolz.

code here

@FrancescAlted
Copy link
Member

Thanks for the use case. After thinking a bit more about this issue I am +1 on holding the GIL during blosc operation. Blosc 1.5 series does not have a global lock, and having the GIL already available seems like an unexpected blessing to me :)

It is unfortunate that I cannot replicate the previous errors (like #121 or #122), but I think @esc can. Valentin, could you please see if this PR can make the problems go in your machine?

Finally, @mrocklin comparing the bcolz format with HDF5 is pretty much exaggerated ;)

@mrocklin
Copy link
Contributor Author

Finally, @mrocklin comparing the bcolz format with HDF5 is pretty much exaggerated ;)

Happy to change things if you prefer something else

@esc
Copy link
Member

esc commented Mar 26, 2015

From my understanding of the GIL you need to release it, if you want an external C lib to use threads. Thus I was arguing that keeping it will impact performance, but happy to be corrected.

@mrocklin
Copy link
Contributor Author

I believe that this is fortunately not the case. The GIL only applies to threads spawned by Python interacting on Python objects. The Python GIL has no control over threads spawned by C libraries.

Generally speaking we could even make threads outside of Python and use them to operate on Python objects. As long as we never try to acquire the GIL we're fine (well, we're probably screwing up the world, but we're free to do so). Presumably the threads within the blosc library don't bother trying to acquire the GIL.

@FrancescAlted
Copy link
Member

I think holding the GIL only affects other code that also wants to hold it (mainly other Python code). So if we hold the GIL before calling c-blosc, it will still allow c-blosc to run several threads, but this won't let other Python threads to run in parallel, which I think is not that bad.

@esc
Copy link
Member

esc commented Mar 26, 2015

OK, if it really is the case, then I am fine with keeping the GIL (I'll quickly check myself). This applies to python-blosc too, btw.

@mrocklin
Copy link
Contributor Author

FWIW when using BLAS I've observed worse performance from allowing multiple hierarchies of concurrency. If you call many matrix multiplies with multiple GIL releasing Python threads, and if each of those threads calls a multi-core BLAS then all of the threads have crazy contention and reduce performance significantly. The table at the top of this post shows the performance loss.

Of course, blosc operation and matrix-multiply are quite different computationally. Blosc probably thrashes memory less so this is less of an issue. Just wanted to point out the counter-example that sometimes you don't want to boundlessly increase parallelism.

@FrancescAlted
Copy link
Member

In addition, I would also hold the GIL during compression as well. This will be necessary for c-blosc 1.5 where there is not a global lock anymore, so using the one that provides Python (i.e. the GIL) is quite providential. @mrocklin could you please update this PR to hold the GIL also during compression?

@FrancescAlted
Copy link
Member

Also, I would remove the TODO note about trying the GIL again in the future. With c-blosc 1.5 I don't think we are going back.

@mrocklin
Copy link
Contributor Author

I've removed with nogil blocks around all calls to blosc_*compress in carray_ext.pyx and removed the TODO

@FrancescAlted
Copy link
Member

Good. So I think we are all set. I am going to merge this. @esc could you please check what happens with #122 that after this PR is applied? Thanks!

@esc
Copy link
Member

esc commented Mar 26, 2015

It looks good in htop, all cores being utilized and timings across versions are comparable. +1 for merge if you rebase the branch to sit cleanly on master.

@@ -24,6 +24,8 @@ Changes from 0.8.1 to 0.9.0
- Add ``safe=`` keyword argument to control dtype/stride checking on append
(#163 @mrocklin)

- Hold GIL during blosc decompression, avoiding segfaults (#166 @mrocklin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, during compression and decompression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

In principle the old code seems correct.  Sadly something causes a
segfault at these lines if run concurrently.  For now the best thing
seems to be to hold on to the GIL.
@mrocklin
Copy link
Contributor Author

Rebased

@esc
Copy link
Member

esc commented Mar 26, 2015

Thanks! @FrancescAlted please do the honors!

@esc
Copy link
Member

esc commented Mar 26, 2015

I addition to requiring release notes, I have also made sure that all feature branches are rebased onto master. That way the history now looks super-clean:

http://imgdump.zetatech.org/bcolz-branches.png

\o/

FrancescAlted added a commit that referenced this pull request Mar 26, 2015
Remove with nogil statement to avoid segfault
@FrancescAlted FrancescAlted merged commit 09550e2 into Blosc:master Mar 26, 2015
@mrocklin mrocklin deleted the avoid-segfault branch March 26, 2015 17:01
@mrocklin
Copy link
Contributor Author

Happy to adhere to any standards. Thank you both for the thorough review and for getting this in.

@FrancescAlted
Copy link
Member

@esc all yours for checking #122 out.

@esc
Copy link
Member

esc commented Mar 26, 2015

I'll have to be AFK now but tomorrow evening at the latest.

@FrancescAlted
Copy link
Member

Ok. At any rate, I will use current master against c-blosc 1.5 and report my results too.

@esc esc added this to the v0.9.0 milestone Mar 26, 2015
@esc
Copy link
Member

esc commented Mar 26, 2015

I updated the milestone and labels too.

@FrancescAlted
Copy link
Member

Sorry @esc, I was referring to #121 . Anyway, I have confirmed that #122 is solved by this.

This was referenced Mar 26, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants