-
Notifications
You must be signed in to change notification settings - Fork 150
Remove with nogil statement to avoid segfault #166
Conversation
I assume that you are experiencing this using latest c-blosc (1.5 series). I don't think keeping the GIL would be the best approach, and I'd rather lean to use a specific call that will restrict bcolz using one single thread in c-blosc when in multithreading operation. How to implement this is open to discussion. |
Oddly I'm at blosc 1.4.1
It's also quite possible that something is wrong upstream with my code which uses bcolz. I believe I'm handling everything safely but I could be wrong. I agree that it would be better to learn what is the root cause and fix that. My thought here was that I'm not yet likely to dive down into the blosc codebase and my understanding is that you all are fairly busy. Holding the GIL seemed like a cheap temporary solution. |
Well, having the problem with blosc 1.4.1 is weird indeed. Could you please post a minimal self-contained example reproducing the issue? At any rate, if it is confirmed that bcolz with blosc 1.4.1 is actually having a problem with multithreading, well, then it might be that your PR is the fastest way to get rid of the problem (at the cost of some performance). |
That is a very reasonable request but also fairly difficult. My bcolz code is deep within a concurrent program and this only seems to occur in somewhat heavy workloads. I'll poke around a bit though and see if I can reduce things. |
While trying to isolate this I'm getting more fun error messages like these; not sure if they're useful.
|
Well, if they disappear when holding the GIL that means that Blosc was not safe to use by multithreaded apps. Please double check that |
Setting This doesn't seem to be a case of multiple threads within blosc causing an issue, but rather multiple threads calling bcolz functions calling blosc functions. I haven't yet been able to reproduce this in an isolated setting. |
I am not sure that 1.4.1 is actually thread safe since it uses a global state. Not releasing the GIL on the other hand will impact performance significantly. I think this is a blocker for the time being until the 1.5.x can be fixed. This includes support for Blosc contexts and should be thread safe, IIRC. |
In my particular use case this doesn't affect my normal workflows. Do you have many users who do heavy concurrent reads on bcolz arrays? Given that blosc itself can be multithreaded it seems like there wouldn't be much performance benefit to releasing the GIL at this stage. |
I can also operate from a fork until blosc 1.5.x is fixed. |
Thanks for the use case. After thinking a bit more about this issue I am +1 on holding the GIL during blosc operation. Blosc 1.5 series does not have a global lock, and having the GIL already available seems like an unexpected blessing to me :) It is unfortunate that I cannot replicate the previous errors (like #121 or #122), but I think @esc can. Valentin, could you please see if this PR can make the problems go in your machine? Finally, @mrocklin comparing the bcolz format with HDF5 is pretty much exaggerated ;) |
Happy to change things if you prefer something else |
From my understanding of the GIL you need to release it, if you want an external C lib to use threads. Thus I was arguing that keeping it will impact performance, but happy to be corrected. |
I believe that this is fortunately not the case. The GIL only applies to threads spawned by Python interacting on Python objects. The Python GIL has no control over threads spawned by C libraries. Generally speaking we could even make threads outside of Python and use them to operate on Python objects. As long as we never try to acquire the GIL we're fine (well, we're probably screwing up the world, but we're free to do so). Presumably the threads within the blosc library don't bother trying to acquire the GIL. |
I think holding the GIL only affects other code that also wants to hold it (mainly other Python code). So if we hold the GIL before calling c-blosc, it will still allow c-blosc to run several threads, but this won't let other Python threads to run in parallel, which I think is not that bad. |
OK, if it really is the case, then I am fine with keeping the GIL (I'll quickly check myself). This applies to python-blosc too, btw. |
FWIW when using BLAS I've observed worse performance from allowing multiple hierarchies of concurrency. If you call many matrix multiplies with multiple GIL releasing Python threads, and if each of those threads calls a multi-core BLAS then all of the threads have crazy contention and reduce performance significantly. The table at the top of this post shows the performance loss. Of course, blosc operation and matrix-multiply are quite different computationally. Blosc probably thrashes memory less so this is less of an issue. Just wanted to point out the counter-example that sometimes you don't want to boundlessly increase parallelism. |
In addition, I would also hold the GIL during compression as well. This will be necessary for c-blosc 1.5 where there is not a global lock anymore, so using the one that provides Python (i.e. the GIL) is quite providential. @mrocklin could you please update this PR to hold the GIL also during compression? |
Also, I would remove the TODO note about trying the GIL again in the future. With c-blosc 1.5 I don't think we are going back. |
I've removed |
It looks good in htop, all cores being utilized and timings across versions are comparable. +1 for merge if you rebase the branch to sit cleanly on |
@@ -24,6 +24,8 @@ Changes from 0.8.1 to 0.9.0 | |||
- Add ``safe=`` keyword argument to control dtype/stride checking on append | |||
(#163 @mrocklin) | |||
|
|||
- Hold GIL during blosc decompression, avoiding segfaults (#166 @mrocklin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, during compression and decompression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
d7d2f06
to
acb453c
Compare
In principle the old code seems correct. Sadly something causes a segfault at these lines if run concurrently. For now the best thing seems to be to hold on to the GIL.
acb453c
to
273b2ab
Compare
Rebased |
Thanks! @FrancescAlted please do the honors! |
I addition to requiring release notes, I have also made sure that all feature branches are rebased onto master. That way the history now looks super-clean: http://imgdump.zetatech.org/bcolz-branches.png \o/ |
Remove with nogil statement to avoid segfault
Happy to adhere to any standards. Thank you both for the thorough review and for getting this in. |
I'll have to be AFK now but tomorrow evening at the latest. |
Ok. At any rate, I will use current master against c-blosc 1.5 and report my results too. |
I updated the milestone and labels too. |
Concurrent reads sometimes trigger a segfault somewhere within
blosc_decompress
.In principle the old bcolz code seems correct; presumably there is some issue lower down? For now a cheap solution seems to be to hold on to the GIL.
Full trace is here:
Relevant line numbers often point to this carray_ext.c file