-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread safety #882
Comments
Thanks, yea this sounds like it could be the cause. I tried your suggestion and it deadlocks. My guess is you have to do something like here. Playing with it, although I'm not an expert. |
I hope we don't have to do something as complicated as this code in FFTW.jl. See the discussion in JuliaMath/FFTW.jl#141 where we ran into deadlocks during GC because FFTW was trying to acquire a mutex lock during plan destruction. |
Hmm, unfortunately I think we may be in the same situation as FFTW, because Python can call back to Julia. That is, the following may occur:
So, we may need something like the FFTW.jl solution after all. |
Thanks, need to think about this and read that code, but one thing I dont totally follow is that in solution in my PR, the PyObject finalizer never tries to acquire the GIL. So are we fine? |
For some uses PythonCall.jl [EDIT: renamed from Python.jl] may be preferable? I'm not yet clear about the pros and cons, or why that new package needed, but at least it has some GIL-related code, so you could either use that package or maybe it's helpful to look at the code to fix here: |
I'm trying to run pytorch code from julia's pycall. I know that the pytorch code does not require GIL so I would like to run multiple of this code with julia threads -> pycall -> without GIL. Is this automatically achieved or should I do anything extra? |
Users of PyCall can use a global lock that every call to Python has to acquire. module MyModule
const PYLOCK = Ref{ReentrantLock}()
function __init__() # instantiate the lock
PYLOCK[] = ReentrantLock()
end
# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])
function example_function()
pylock() do
# any code that calls Python
end
end
end # module This approach works at least if Julia calls Python without any call-back to Julia. I assume that it also works with a call-back, as long as this call-back does not call Python again. EDITI have verified the above approach with the following example, where we intend to fill the large array using MyModule, PyCall
a = zeros(10000)
np_random = pyimport("numpy.random") The following code, which does not acquire a lock, segfaults: Threads.@threads for i in 1:length(a)
a[i] = np_random.rand()
end However, we can acquire the lock to achieve the desired effect. Threads.@threads for i in 1:length(a)
MyModule.pylock() do
a[i] = np_random.rand()
end
end Of course, this locking mechanism requires all Python calls to acquire the lock. And acquiring the lock limits the degree to which our program parallelizes. In the above example, there is indeed no parallelism at all because the lock has to be acquired for each, complete iteration of the loop. |
I recently tried doing what you suggested with a simple JSON-writing exercise (I know Julia has the same feature, this is just a minimal example)
However, I get this error:
|
Thanks for reporting! I figured that the error occurs because your threads are still running when Julia exits (EDIT: to be more clear, my pylock is not the cause of this error). A solution is to add two lines to your script:
This will make your code wait for all threads before Julia exits. |
@mirkobunse Anyway, this is a huge leap in usability from when I tried to use PyCall in a multithreaded environment a couple of years ago. Whoever did this, thanks a lot! |
Unfortunately, I spoke too soon. I realized that on the linux command line version, I forgot to enable multiple threads. I'm really trying to do this in a way that is safe. I'm putting all my python objects as global constants to they don't get unintentionally garbage collected, and I'm implementing the lock functionality. Here is my latest script
I'm able to get crashes in both windows and linux. For windows:
For linux, I have:
|
Sorry, I'm not able to fix this example. Using |
Yeah, this is a tough one that's hard to track down. I got it to a point in #1006 where I could still use @async on PyCall and @threads for the multithreaded Julia code, I wasn't able to crash that. I think there is something about @Spawn that really trips up a running PyCall task. Only being able to use @threads while PyCall isn't running isn't ideal, but at least that's better than Python, and @threads got better with Julia 1.8. Honestly, I mostly use PyCall for IO related tasks on Azure, so doing an @async on that will likely be just as good as multithreading, if we could use @Spawn with impunity, but I guess I can settle on @threads. |
@mirkobunse I found a solution that might interest you. I think garbage collection was being triggered in another thread (which is why the segfault happened indeterministically) and probably nuked something running in PyCall. The solution is a bit kludgy, but I fixed this by modifying the pylock() function to temporarily disable GC in the same manner that the lock/unlock is being performed under the hood.
This of course validates yet again the importance of this issue with respect to protecting PyCall tasks from GC calls from other threads. |
This solution is amazing, thanks! The I have also replaced the This is your example with the changes I propose:
|
Right, put the GC.enable inside the lock. That makes sense. Moreover, your new pylock() function increases generality in case the GC was already turned off. I mean, right now that's a kludge until we can figure out how to make the Python code safer against garbage collection from other threads. However, as you showed, this is a viable workaround as it allows us to really weave in the Python calls with all the other multithreaded tasks. You have no idea how excited I am about this solution, I've been suffering from this problem for a couple of years! |
Same for me :D Thanks so much for the |
Hi, I am obtaining segmentation fault when trying to use multithreading with NOMAD.jl where my loss functions are in Python and I am using PyCall.jl : Has anyone had a similar problem? |
Multiple times. When calling Python in a multithreaded environment you have to use the pylock() pattern we figured out in the discussion above. Issue #1006 also has this workaround. First, you have to build a PyLock function that locks the python process and disables the garbage collector while the Python code is running.
You apply the pylock function around your PyCall code.
|
Thank you! I will try as you suggest. |
Hi @the-noble-argon , when I do
the first time that χ² appears in the editor is marked as not used. Does pylock introduce a new scope? |
Yes, because it's a standard χ² = pylock() do
stream.chi2_full(θ, ω, β, m, ic, r☼)
end |
Thanks @stevengj , now it starts running but doesn't use the 4 threads I set with
Is that a known problem? |
This doesn't look like the kinds of errors I get when PyCall segfaults on me, in such cases I get very long messages. I'm not sure this is a PyCall.jl issue. Can you try calling your loss function repeatedly in a multithreaded loop without NOMAD.jl and see if you get any more informative errors? Is there a way to track what NOMAD.jl uses as input to your loss function and build a multithreaded for loop for that? Before trying this however, we may need to take a step backward. You should be aware that a python process can't be run in a multithreaded manner (blame Python's GIL which I hear they're trying to remove yet again). If the pylock() pattern is properly used, it will prevent any python code from running in parallel and prevent any garbage collection while python code is running. So if most of your computational effort is spent running your Python loss function, multithreading will achieve little more than causing you anguish. You may need to rewrite your loss function in Julia if you want your loss function to multithread. |
Thanks! Good to know what you say so that I stop trying with this path. The Python loss function is by orders of manitude that most time consuming part of my code. For a future project I will re-write it in julia. If I use distributed processes, will I have the same problem with Python's GIL ? |
I think that should work. Distributed processes have their own memory spaces managed by the O.S. so you don't run into the nasty kinds of problems of multiple threads changing things in the same process memory space. You can have as many Python processes running on one machine as you want (obviously within reason). Python was never meant for high performance computing, it was meant to glue many (potentially high performance) packages together. Many Python packages do multithreading under the hood because they are written in high performance multithreaded languages. It may surprise you, but most data science isn't actually executed in Python, it's executed with Python libraries written in languages like C++. The core aim of many Python developers is to spend as little compute time in Python as possible, but have most compute done in high-performance libraries. I use a lot of Python, but what I DONT use it for is optimization, especially if I have to write the objective/loss function myself. These functions get called over and over again, so there is a huge benefit from having the loss function compiled (which Python doesn't do unless you dabble in Numba/Cython, but those are finicky about the data types you can use). Because of Julia's flexibility, ease of writing, and JIT compiling nature, it is hands-down the best language I've ever used for optimizing custom objective functions. If this wasn't enough, Julia also has some really good automatic differentiation tools that analyze your code and writes the derivatives for you (if your loss function is differentiable and entirely written in Julia). When I write Julia/Python combinations, I use PyCall to call the third party APIs/SDKs (because everyone writes APIs/SDKs for Python) and Julia for the heavy number crunching. |
Yes, you are right. The loss function is in python because of a long story but at present I write everything in Julia. Anyway I will have to call some coordinate transformations from Astropy (Python api). Can I do automatic differentiation throgh julia code that calls PyCall or PythonCall ? |
No, you can't differentiate through anything that isn't Julia, you would have to write your own differentiation rule for anything that leaves the Julia environment. I learned that the hard way when trying to do a maximum-likelihood estimate of a gamma distribution, which called the incomplete gamma function which called an R function written in Fortran under the hood. But I COULD maximize a normal distribution, because the erf implementation was in Julia. Go figure. |
Thanks a lot! |
If differentiation is your main concern, you might want to try Python's JAX package. It has a numpy wrapper, which can be used just like regular numpy, and which makes most numpy functions automatically differentiable. You just need to import this wrapper
instead of importing the original numpy and use automatic differentiation
to differentiate (almost) any numpy-based function. Translating your original numpy code to JAX code might be easier than translating everything to Julia. |
If Julia's
Threads.nthreads() > 1
, we may want to do some more work to ensure thread safety. In particular:PyEval_InitThreads()
in__init__
(this is only needed for Python ≤ 3.6, and is a no-op in Python ≥ 3.7).finalizer
, by callingPyGILState_Ensure
(returns anenum
, i.e. aCint
in Julia) andPyGILState_Release
. This is because the Julia GC may be called from threads other than the main thread, so we need to ensure that we hold Python's GIL before decref-ing.We might also expose an API to acquire the GIL, but in general I would recommend that user code should only call Python from the main thread.
See this forum post and #881 for a possible test case.
The text was updated successfully, but these errors were encountered: