Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

julia thread hangs with multiple sevals #202

Closed
dpinol opened this issue Jul 18, 2022 · 12 comments
Closed

julia thread hangs with multiple sevals #202

dpinol opened this issue Jul 18, 2022 · 12 comments

Comments

@dpinol
Copy link
Contributor

dpinol commented Jul 18, 2022

Hi,
I can reproduce a hang (no cpu activity) with the following code on 1.8rc1, 1.8rc3 & 1.9dev.
I'm running on Ubuntu 21.10, but I can also reproduce it on Apple M1.

Curiously, but only in 1.8rc1, the hang does not occur if I uncomment the println(42).
I could not reproduce the hang directly on julia.
On my real code I tried merging the 2 seval calls, but it still freezes with 1.

It's very important to enable parallelism setting env vars JULIA_NUM_THREADS=6

from juliacall import Main as jl

jl.seval(
    """
    function worker()
            for i in 1:typemax(Int64)
                a = Float64[]
                push!(a, 0.42)
                i % 1000 == 0 && println(i)
            end
    end
"""
)
jl.seval(
    """
begin
#println(42) #this fixes hang only on 1.8rc1
t = Threads.@spawn worker()
println("waiting")
wait(t)
end
"""
)

On my computer it hangs after printing 30200

....
301000
302000

The code runs fine also with a single eval

from juliacall import Main as jl

jl.seval(
    """
begin
    function worker()
            for i in 1:typemax(Int64)
                a = Float64[]
                push!(a, 0.42)
                i % 1000 == 0 && println(i)
            end
    end

#println(4) #this fixes hang only on 1.8rc1
t = Threads.@spawn worker()
println("waiting")
wait(t)
end
"""
)

thanks

@dpinol
Copy link
Contributor Author

dpinol commented Jul 18, 2022

Maybe related to JuliaLang/julia#45899 ? However, I still get the hang with julia 1.8rc3

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 18, 2022

I don't have a Linux box, but I'm failing to reproduce your issue in a VM (WSL on Windows and Docker on Mac).

Can you give me precise instructions of how you can reproduce your issue on a fresh box - what exactly do you install (versions of Python and Julia and their packages) and what commands do you run?

Though TBH even if I could reproduce it I'm not sure where I'd start debugging this. It seems like an issue with task/thread scheduling, which I know very little about.

@dpinol
Copy link
Contributor Author

dpinol commented Jul 18, 2022

Hi,
I forgot to mention that you'll need to activate julia multithreading with env var JULIA_NUM_THREADS=6. Otherwise, it never hangs

OS:

I can reproduce it both:

  • directly on Ubuntu
  • on an Ubuntu docker image on Ubuntu. Only with julia and python. I only install juliacall pip.

Julia

I can reproduce it on julia 1.7, 1.8-rc1, 1.8-rc3 and 1.9master

Python

I can reproduce it on python 3.9 & 3.10

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 18, 2022

Ah possibly (hopefully) related to #201 then. My best guess is that since your loop is allocating, at some point GC is invoked, which triggers the finalizer of some Python object, which deadlocks the GIL lock it acquires.

Support for working in a multithreaded environment should be considered experimental at best right now.

@dpinol
Copy link
Contributor Author

dpinol commented Jul 18, 2022

GC is invoked, which triggers the finalizer of some Python object, which deadlocks the GIL lock it acquires.

do you mean python or Julia GC?

some Python object

do you mean internal python objects? The test above does not make any communication betwen julia & python

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 18, 2022

I mean Julia GC. Your actual code doesn't touch python, but the act of calling jl.seval probably internally creates some temporary python objects which get GC'd at some point.

@dpinol
Copy link
Contributor Author

dpinol commented Jul 19, 2022

yes! In my real code, this trick solves the hang

for i in 1:total
        GC.enable(false)
        Threads.@threads for x in list
            loop(x)
        end
        GC.enable(true)
        i % 100 == 0 && GC.gc(false)
end

It's important to periodically call GC to avoid memory exhaustion (see JuliaLang/julia#45068)

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 19, 2022

OK great.

I can also reproduce the issue. Another work-around is to insert GC.gc() into the top of the second chunk of code. Presumably this tidies up any Python objects left over from the first jl.seval() on the main thread.

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 19, 2022

Over on the gc branch I have added functions which allow you to temporarily disable the Python garbage collector. The below is a modified version of your code to use this.

from juliacall import Main as jl

jl.seval(
    """
    function worker()
            for i in 1:10_000_000
                a = Float64[]
                push!(a, 0.42)
                i % 1000 == 0 && println(i)
            end
    end
"""
)
jl.seval(
    """
begin
PythonCall.C.gc_disable()
t = Threads.@spawn worker()
println("waiting")
wait(t)
PythonCall.C.gc_enable()
end
"""
)

If you want to try it out, check out the branch, copy pysrc/juliacall/juliapkg-dev.json to pysrc/juliacall/juliapkg.json and pip install -e ..

@cjdoris
Copy link
Collaborator

cjdoris commented Aug 20, 2022

I've just released a version of PythonCall with these functions, except they are now called PythonCall.GC.enable() and PythonCall.GC.disable(). I think this is the best solution to your problem right now. Feel free to open a new issue with any problems.

@cjdoris cjdoris closed this as completed Aug 20, 2022
@dpinol
Copy link
Contributor Author

dpinol commented Aug 23, 2022

hi, thanks for looking into this!
It looks like I missed your messages from July :-(
I did a quick test with 0.9.5 on my project, and unfortunately, it now crashes with and without calling GC.enable/disable.
I'll try to find some time this week to run more tests

@hhaensel
Copy link

I found a solution for a similar problem and mentioned that in #201, which is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants