Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More thread-safe GC #529

Merged
merged 23 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/tests-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
- uses: julia-actions/julia-runtest@v1
env:
JULIA_DEBUG: PythonCall
JULIA_NUM_THREADS: '2'
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
with:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
uses: julia-actions/julia-runtest@v1
env:
JULIA_DEBUG: PythonCall
JULIA_NUM_THREADS: '2'
- name: Process coverage
uses: julia-actions/julia-processcoverage@v1
- name: Upload coverage to Codecov
Expand Down Expand Up @@ -82,6 +83,8 @@ jobs:
- name: Run tests
run: |
pytest -s --nbval --cov=pysrc ./pytest/
env:
PYTHON_JULIACALL_THREADS: '2'
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v2
env:
Expand Down
30 changes: 17 additions & 13 deletions docs/src/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,23 @@

No.

Some rules if you are writing multithreaded code:
- Only call Python functions from the first thread.
- You probably also need to call `PythonCall.GC.disable()` on the main thread before any
threaded block of code. Remember to call `PythonCall.GC.enable()` again afterwards.
(This is because Julia finalizers can be called from any thread.)
- Julia intentionally causes segmentation faults as part of the GC safepoint mechanism.
If unhandled, these segfaults will result in termination of the process. To enable signal handling,
set `PYTHON_JULIACALL_HANDLE_SIGNALS=yes` before any calls to import juliacall. This is equivalent
to starting julia with `julia --handle-signals=yes`, the default behavior in Julia.
See discussion [here](https://github.com/JuliaPy/PythonCall.jl/issues/219#issuecomment-1605087024) for more information.
- You may still encounter problems.

Related issues: [#201](https://github.com/JuliaPy/PythonCall.jl/issues/201), [#202](https://github.com/JuliaPy/PythonCall.jl/issues/202)
However it is safe to use PythonCall with Julia with multiple threads, provided you only
call Python code from the first thread. (Before v0.9.22, tricks such as disabling the
garbage collector were required.)

From Python, to use JuliaCall with multiple threads you probably need to set
[`PYTHON_JULIACALL_HANDLE_SIGNALS=yes`](@ref julia-config) before importing JuliaCall.
This is because Julia intentionally causes segmentation faults as part of the GC
safepoint mechanism. If unhandled, these segfaults will result in termination of the
process. This is equivalent to starting julia with `julia --handle-signals=yes`, the
default behavior in Julia. See discussion
[here](https://github.com/JuliaPy/PythonCall.jl/issues/219#issuecomment-1605087024)
for more information.

Related issues:
[#201](https://github.com/JuliaPy/PythonCall.jl/issues/201),
[#202](https://github.com/JuliaPy/PythonCall.jl/issues/202),
[#529](https://github.com/JuliaPy/PythonCall.jl/pull/529)

## Issues when Numpy arrays are expected

Expand Down
8 changes: 8 additions & 0 deletions docs/src/releasenotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Release Notes

## Unreleased
* Finalizers are now thread-safe, meaning PythonCall now works in the presence of
multi-threaded Julia code. Previously, tricks such as disabling the garbage collector
were required. Python code must still be called on the main thread.
* `GC.disable()` and `GC.enable()` are now a no-op and deprecated since they are no
longer required for thread-safety. These will be removed in v1.
* Adds `GC.gc()`.

## 0.9.21 (2024-07-20)
* `Serialization.serialize` can use `dill` instead of `pickle` by setting the env var `JULIA_PYTHONCALL_PICKLE=dill`.
* `numpy.bool_` can now be converted to `Bool` and other number types.
Expand Down
25 changes: 25 additions & 0 deletions pytest/test_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,28 @@ def test_issue_433():
"""
)
assert out == 25

def test_julia_gc():
from juliacall import Main as jl
# We make a bunch of python objects with no reference to them,
# then call GC to try to finalize them.
# We want to make sure we don't segfault.
# Here we can (manually) verify that the background task is running successfully,
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
# by seeing the printout "Python GC (100 items): 0.000000 seconds."
# We also programmatically check things are working by verifying the queue is empty.
# Debugging note: if you get segfaults, then run the tests with
# `PYTHON_JULIACALL_HANDLE_SIGNALS=yes python3 -X faulthandler -m pytest -p no:faulthandler -s --nbval --cov=pysrc ./pytest/`
# in order to recover a bit more information from the segfault.
jl.seval(
"""
using PythonCall, Test
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
"""
)
125 changes: 100 additions & 25 deletions src/GC/GC.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,77 +3,152 @@

Garbage collection of Python objects.

See `disable` and `enable`.
See [`gc`](@ref).
"""
module GC

using ..C: C

const ENABLED = Ref(true)
const QUEUE = C.PyPtr[]
const QUEUE = (; items = C.PyPtr[], lock = Threads.SpinLock())
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
const HOOK = Ref{WeakRef}()

"""
PythonCall.GC.disable()

Disable the PythonCall garbage collector.
Do nothing.

This means that whenever a Python object owned by Julia is finalized, it is not immediately
freed but is instead added to a queue of objects to free later when `enable()` is called.
!!! note

Like most PythonCall functions, you must only call this from the main thread.
Historically this would disable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function disable()
ENABLED[] = false
return
Base.depwarn(
"disabling the PythonCall GC is no longer needed for thread-safety",
:disable,
)
nothing
end

"""
PythonCall.GC.enable()

Re-enable the PythonCall garbage collector.
Do nothing.

This frees any Python objects which were finalized while the GC was disabled, and allows
objects finalized in the future to be freed immediately.
!!! note

Like most PythonCall functions, you must only call this from the main thread.
Historically this would enable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function enable()
ENABLED[] = true
if !isempty(QUEUE)
for ptr in QUEUE
Base.depwarn(
"disabling the PythonCall GC is no longer needed for thread-safety",
:enable,
)
nothing
end

"""
PythonCall.GC.gc()

Free any Python objects waiting to be freed.

These are objects that were finalized from a thread that was not holding the Python
GIL at the time.

Like most PythonCall functions, this must only be called from the main thread (i.e. the
thread currently holding the Python GIL.)
"""
function gc()
if C.CTX.is_initialized
unsafe_free_queue()
end
nothing
end

function unsafe_free_queue()
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
Base.@lock QUEUE.lock begin
for ptr in QUEUE.items
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
empty!(QUEUE.items)
end
empty!(QUEUE)
return
nothing
end

function enqueue(ptr::C.PyPtr)
# If the ptr is NULL there is nothing to free.
# If C.CTX.is_initialized is false then the Python interpreter hasn't started yet
# or has been finalized; either way attempting to free will cause an error.
if ptr != C.PyNULL && C.CTX.is_initialized
if ENABLED[]
if C.PyGILState_Check() == 1
# If the current thread holds the GIL, then we can immediately free.
C.Py_DecRef(ptr)
# We may as well also free any other enqueued objects.
if !isempty(QUEUE.items)
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
unsafe_free_queue()
end
else
push!(QUEUE, ptr)
# Otherwise we push the pointer onto the queue to be freed later, either:
# (a) If a future Python object is finalized on the thread holding the GIL
# in the branch above.
# (b) If the GCHook() object below is finalized in an ordinary GC.
# (c) If the user calls PythonCall.GC.gc().
Base.@lock QUEUE.lock push!(QUEUE.items, ptr)
end
end
return
nothing
end

function enqueue_all(ptrs)
if C.CTX.is_initialized
if ENABLED[]
if any(!=(C.PYNULL), ptrs) && C.CTX.is_initialized
if C.PyGILState_Check() == 1
for ptr in ptrs
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
if !isempty(QUEUE.items)
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
unsafe_free_queue()
end
else
append!(QUEUE, ptrs)
Base.@lock QUEUE.lock append!(QUEUE.items, ptrs)
end
end
return
nothing
end

"""
GCHook()

An immortal object which frees any pending Python objects when Julia's GC runs.
cjdoris marked this conversation as resolved.
Show resolved Hide resolved

This works by creating it but not holding any strong reference to it, so it is eligible
to be finalized by Julia's GC. The finalizer empties the PythonCall GC queue if
possible. The finalizer also re-attaches itself, so the object does not actually get
collected and so the finalizer will run again at next GC.
"""
mutable struct GCHook
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
function GCHook()
finalizer(_gchook_finalizer, new())
end
end

function _gchook_finalizer(x)
if C.CTX.is_initialized
finalizer(_gchook_finalizer, x)
if !isempty(QUEUE.items) && C.PyGILState_Check() == 1
cjdoris marked this conversation as resolved.
Show resolved Hide resolved
unsafe_free_queue()
end
end
nothing
end

function __init__()
HOOK[] = WeakRef(GCHook())
nothing
end

end # module GC
33 changes: 32 additions & 1 deletion test/GC.jl
Original file line number Diff line number Diff line change
@@ -1 +1,32 @@
# TODO
@testset "201: GC segfaults" begin
# https://github.com/JuliaPy/PythonCall.jl/issues/201
# This should not segfault!
cmd = Base.julia_cmd()
path = joinpath(@__DIR__, "finalize_test_script.jl")
p = run(`$cmd -t2 --project $path`)
@test p.exitcode == 0
end

@testset "GC.gc()" begin
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
Threads.nthreads() > 1 && @test !isempty(PythonCall.GC.QUEUE.items)
PythonCall.GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
end

@testset "GC.GCHook" begin
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
Threads.nthreads() > 1 && @test !isempty(PythonCall.GC.QUEUE.items)
GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
end
9 changes: 9 additions & 0 deletions test/finalize_test_script.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using PythonCall

# This would consistently segfault pre-GC-thread-safety
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
Loading