Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sleep(n) sometimes segfaults in a threaded loop #32152

Closed
jpsamaroo opened this issue May 27, 2019 · 9 comments · Fixed by #32174
Closed

sleep(n) sometimes segfaults in a threaded loop #32152

jpsamaroo opened this issue May 27, 2019 · 9 comments · Fixed by #32174
Labels
multithreading Base.Threads and related functionality

Comments

@jpsamaroo
Copy link
Member

jpsamaroo commented May 27, 2019

When the following code is run with JULIA_NUM_THREADS=4, occasionally (about every 3 runs) I get a segfault on the second threaded loop invocation. I can't reproduce this with JULIA_NUM_THREADS=1.

MWE:

function f()
    Threads.@threads for i in 1:Threads.nthreads()
        sleep(1)
    end
end

println("First loop")
f()
println("Second loop")
f()
println("Done")

Occasional segfault message:

> JULIA_NUM_THREADS=4 julia-nightly tloop.jl
First loop
Second loop

signal (11): Segmentation fault
in expression starting at /home/jpsamaroo/tloop.jl:10
heap_remove at /workspace/srcdir/libuv/src/heap-inl.h:196
uv_timer_stop at /workspace/srcdir/libuv/src/timer.c:97
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:160
uv_run at /workspace/srcdir/libuv/src/unix/core.c:375
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:421
poptaskref at ./task.jl:568
wait at ./task.jl:575
wait at ./condition.jl:104
stream_wait at ./stream.jl:47
wait at ./asyncevent.jl:116
sleep at ./asyncevent.jl:187 [inlined]
macro expansion at /home/jpsamaroo/tloop.jl:3 [inlined]
#2#threadsfor_fun at ./threadingconstructs.jl:64
#2#threadsfor_fun at ./threadingconstructs.jl:31
unknown function (ip: 0x7faa244052ec)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2043 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2213
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1624 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:594
unknown function (ip: 0xffffffffffffffff)
Allocations: 152504 (Pool: 152439; Big: 65); GC: 0
Segmentation fault (core dumped)

EDIT: Corrected versioninfo():

julia> versioninfo()
Julia Version 1.3.0-DEV.289
Commit e39f498aca (2019-05-25 00:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_REVISE_POLL = 1
  JULIA_LOAD_PATH = @dev:@:@v#.#:@stdlib
  JULIA_PKG3_PRECOMPILE = 1

@JeffBezanson

@vtjnash
Copy link
Member

vtjnash commented May 27, 2019

Only output is permitted on threads or threaded regions, no other I/O operation should be attempted.

@StefanKarpinski
Copy link
Member

Once the release is final, it would be good to document what is allowed in threaded sections.

@kmsquire
Copy link
Member

To many programmers, sleep might not seem like an I/O operation (I had to look it up and think about it).

For the sake of completeness here, in Julia, sleep is defined here, which defines a Timer calls wait.

While it's clear that the Timer is being opened, digging deeper than this would take some work.

Two questions:

  1. Is there any documentation on how sleep actually works in Julia at a low level?
  2. If one wanted to actually sleep in a thread, is there a way to do it? Would making a system call to sleep or usleep (or Sleep) in a thread be safe?

@JeffBezanson
Copy link
Member

We just need to fix it.

@jpsamaroo
Copy link
Member Author

Only output is permitted on threads or threaded regions, no other I/O operation should be attempted.

It would be nice if we can allow I/O to be called from threads, even if that I/O is actually executed by the main thread or libuv.

@RalphAS
Copy link

RalphAS commented May 27, 2019

@kmsquire Yes, libc sleep() is thread-safe; the libuv event loop used by Timer and Base I/O is not.

The wait(::Timer) method protects libuv access with a lock, so it seems ok to use that in threaded blocks, if you construct the Timer outside the threaded region, as shown in this related example on discourse. I'd like to know if I've violated any rules there.

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label May 28, 2019
@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 28, 2019

There is going to be a long tail of operations that are not threadsafe. The plan is to fix them all, not document that they are broken, so in general, there will be no rule to remember which things are safe or not safe, instead people should just file issues when they find anything that isn't threadsafe.

@jpsamaroo
Copy link
Member Author

jpsamaroo commented May 28, 2019

I'm poking around at this problem to see if I can fix it. I have a question about the following stacktrace:

signal (11): Segmentation fault
in expression starting at /home/jpsamaroo/tloop.jl:10
heap_remove at /workspace/srcdir/libuv/src/heap-inl.h:196
uv_timer_stop at /workspace/srcdir/libuv/src/timer.c:97
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:160
uv_run at /workspace/srcdir/libuv/src/unix/core.c:375
jl_task_get_next at /opt/julia-dev/src/partr.c:421
poptaskref at ./task.jl:568
wait at ./task.jl:575
wait at ./condition.jl:104
stream_wait at ./stream.jl:47
wait at ./asyncevent.jl:116
sleep at ./asyncevent.jl:187 [inlined]
macro expansion at /home/jpsamaroo/tloop.jl:3 [inlined]

It seems like the segfault occurs strictly within libuv itself (although this is invoked from jl_task_get_next, which probably shouldn't be changed to fix this). Does this imply that libuv itself needs to be modified somehow to account for multithreaded sleep? If so, does anyone have a recommendation on how I can go about accessing jl_uv_mutex from within libuv (which I assume is what I'd need to lock/unlock when fiddling with timers)?

@JeffBezanson
Copy link
Member

No I don't think this implies that libuv itself needs to be modified. My guess is that its internal data structures are corrupted here due to access from multiple threads. That is what jl_uv_mutex is for; we wrap calls to libuv in that. We've talked about more invasive or finer-grained locking inside libuv, but it shouldn't be strictly necessary except possibly for performance at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants