Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional segfaults when running with @threads #44460

Closed
chipbuster opened this issue Mar 4, 2022 · 11 comments · Fixed by #44476
Closed

Occasional segfaults when running with @threads #44460

chipbuster opened this issue Mar 4, 2022 · 11 comments · Fixed by #44476
Labels
multithreading Base.Threads and related functionality

Comments

@chipbuster
Copy link
Contributor

chipbuster commented Mar 4, 2022

This is mostly a condensed repost of #44019. I was told that julia 1.7.1 was a "very old release to be relying on threading," but the issue still occurs on the latest nightly.

Running the following code will sometimes segfault. It is more common with 32 threads, but will sometimes occur with 16 or fewer threads.

Current Code
using Base.Iterators
using Base.Threads
using Serialization
using Distributions

function bin_data(data, lo, hi, nbins)
    dx = (hi - lo) / nbins
    bins = ((data .- lo) ./ dx) .|> floor
    bins = UInt8.(bins)
    clamp.(bins, UInt8(0), UInt8(nbins))
end

l = SpinLock()
function compress_data(data)
    lock(l)
    tmpfn = tempname()
    unlock(l)
    write(tmpfn, data)
    run(
        pipeline(
            `xz -9e --keep --format=raw --suffix=.xz $(tmpfn)`,
            stdout = devnull,
            stderr = devnull,
        ),
    )
    nbytes = filesize(tmpfn * ".xz")
    rm(tmpfn * ".xz")
    rm(tmpfn)
    return nbytes
end

compressed_size_bytes(data) = compress_data(data)
compressed_size_bits(data) = compress_data(data) * 8

function emission_times_exp(n, k, Γ)
    η = (k + Γ) / (k * Γ)
    dist = Exponential(η)
    rand(dist, n)
end

function lose_data(lagtimes, γ)
    @assert(all(lagtimes .>= 0.0))
    ind = Int[]
    fixed_times = cumsum(lagtimes)
    for i = 1:length(lagtimes)
        x = rand()
        if x < γ
            push!(ind, i)
        end
    end
    detected_times = fixed_times[ind]
    detected_times |> diff
end

ns = [100_000, 1_000_000, 10_000_000]
# ns = [1_000]  # testing only
ks = [0.1, 0.5, 1.0, 5.0, 10.0]
Γs = [0.1, 0.5, 1.0, 5.0, 10.0]
γs = range(0.1, 1.0, step = 0.1)
ntrials = 1000

smrates = Iterators.product(ks, Γs) |> collect |> vec

l = SpinLock()
@threads for trialnum = 1:ntrials
    data = Dict()
    for p in smrates
        (k, Γ) = p
        for n in ns
            # nm_times = get_emission_dt(n, k, Γ)
            # mar_times = emission_times_exp(n, k, Γ)
            nm_times = 10.0 .* rand(n)
            mar_times = 10.0 .* rand(n)

            for γ in γs
                nm_lost = lose_data(nm_times, γ)
                mar_lost = lose_data(mar_times, γ)
                hi = max(maximum(nm_lost),maximum(mar_lost))

                @assert(all(nm_lost .>= 0.0))
                @assert(all(mar_lost .>= 0.0))

                nm_binned = bin_data(nm_lost, 0.0, hi, 100)
                mar_binned = bin_data(mar_lost, 0.0, hi, 100)

                nm_size = compressed_size_bytes(nm_binned)
                mar_size = compressed_size_bytes(mar_binned)

                experiment_index = (n = n, k = k, Γ = Γ, γ = γ, trial = trialnum)

                try
                    lock(l)
                    data[experiment_index] = (1.0, 1.0)
                finally
                    unlock(l)
                end
            end
        end
    end
end
Output of `versioninfo()`
Julia Version 1.9.0-DEV.109
Commit 3a47c1c4e1 (2022-03-01 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 32 × AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 1 on 32 virtual cores
Stack trace from the one time in many that the program managed to print one before dying

The line that originates in my code (line 65 at the top) is just the @threads loop.

signal (11): Segmentation fault
in expression starting at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/buggy/gen_paramsweep_segfault.jl:65
jl_uv_call_close_callback at /buildworker/worker/package_linux64/build/src/jl_uv.c:83 [inlined]
jl_uv_closeHandle at /buildworker/worker/package_linux64/build/src/jl_uv.c:106
uv__finish_close at /workspace/srcdir/libuv/src/unix/core.c:301
uv__run_closing_handles at /workspace/srcdir/libuv/src/unix/core.c:315
uv_run at /workspace/srcdir/libuv/src/unix/core.c:393
ijl_process_events at /buildworker/worker/package_linux64/build/src/jl_uv.c:210
ijl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:598
poptask at ./task.jl:884
wait at ./task.jl:893
wait at ./condition.jl:124
wait at ./process.jl:647
success at ./process.jl:509
Exception: /tmp/julia-3a47c1c4e1/bin/julia killed by signal segmentation fault (core dumped)
A valgrind report from the previous issue

Taken from this comment

snellius paulm@tcn116 17:23 ~$ valgrind --smc-check=all-non-file --suppressions=$HOME/c/julia-git/contrib/valgrind-julia.supp ~/software/julia-1.7.1/bin/julia -t 128 segfault.jl
==1988794== Memcheck, a memory error detector
==1988794== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1988794== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1988794== Command: /home/paulm/software/julia-1.7.1/bin/julia -t 128 segfault.jl
==1988794== 
--1988794-- WARNING: unhandled amd64-linux syscall: 1008
--1988794-- You may be able to write your own handler.
--1988794-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--1988794-- Nevertheless we consider this a bug.  Please report
--1988794-- it at http://valgrind.org/support/bug_reports.html.
==1988794== Warning: client switching stacks?  SP change: 0x311d17d8 --> 0xe2a5fff8
==1988794==          to suppress, use: --max-stackframe=2978539552 or greater
==1988794== Warning: invalid file descriptor -1 in syscall close()
==1988794== Warning: invalid file descriptor -1 in syscall close()
==1988794== Warning: client switching stacks?  SP change: 0x30bce7d8 --> 0xef554ff8
==1988794==          to suppress, use: --max-stackframe=3197659168 or greater
==1988794== Warning: client switching stacks?  SP change: 0x34dd57d8 --> 0xf3958ff8
==1988794==          to suppress, use: --max-stackframe=3199744032 or greater
==1988794==          further instances of this message will not be shown.
==1988794== Thread 3:
==1988794== Syscall param write(buf) points to uninitialised byte(s)
==1988794==    at 0x4F4A52D: syscall (in /usr/lib64/libc-2.28.so)
==1988794==  Address 0xef54e000 is in a rw- anonymous segment
==1988794== 
==1988794== Syscall param write(buf) points to unaddressable byte(s)
==1988794==    at 0x4F4A52D: syscall (in /usr/lib64/libc-2.28.so)
==1988794==  Address 0xef54e000 is in a rw- anonymous segment
==1988794== 
==1988794== Thread 80:
==1988794== Invalid read of size 8
==1988794==    at 0x5B87E84: maybe_collect (julia_threads.h:325)
==1988794==    by 0x5B87E84: jl_gc_big_alloc (gc.c:947)
==1988794==  Address 0xfffffffffe49ff10 is not stack'd, malloc'd or (recently) free'd
==1988794== 
==1988794== Thread 88:
==1988794== Invalid read of size 8
==1988794==    at 0x5B7CC12: jl_gc_state_set (julia_threads.h:325)
==1988794==    by 0x5B7CC12: jl_task_get_next (partr.c:523)
==1988794==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1988794== 
==1988794== 
==1988794== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1988794==  Access not within mapped region at address 0x0
==1988794==    at 0x5B7CC12: jl_gc_state_set (julia_threads.h:325)
==1988794==    by 0x5B7CC12: jl_task_get_next (partr.c:523)
==1988794==  If you believe this happened as a result of a stack
==1988794==  overflow in your program's main thread (unlikely but
==1988794==  possible), you can try to increase the size of the
==1988794==  main thread stack using the --main-stacksize= flag.
==1988794==  The main thread stack size used in this run was 16777216.
==1988794== 
==1988794== HEAP SUMMARY:
==1988794==     in use at exit: 603,293,029 bytes in 49,164 blocks
==1988794==   total heap usage: 1,022,326 allocs, 973,162 frees, 3,449,011,340 bytes allocated
==1988794== 
==1988794== LEAK SUMMARY:
==1988794==    definitely lost: 163 bytes in 12 blocks
==1988794==    indirectly lost: 0 bytes in 0 blocks
==1988794==      possibly lost: 1,273,212 bytes in 12,672 blocks
==1988794==    still reachable: 602,018,922 bytes in 36,477 blocks
==1988794==                       of which reachable via heuristic:
==1988794==                         newarray           : 56,448 bytes in 10 blocks
==1988794==                         multipleinheritance: 7,992 bytes in 15 blocks
==1988794==         suppressed: 732 bytes in 3 blocks
==1988794== Rerun with --leak-check=full to see details of leaked memory
==1988794== 
==1988794== Use --track-origins=yes to see where uninitialised values come from
==1988794== For lists of detected and suppressed errors, rerun with: -s
==1988794== ERROR SUMMARY: 15 errors from 4 contexts (suppressed: 46 from 6)
Segmentation fault

An rr trace of the bug under Julia 1.7.1 can be found here. However, I have not been able to catch the program crashing under rr under the nightly build: trying to use BugReporting.jl will simply complete the program successfully, while running julia itself under rr has not managed to produce any useful results yet (it has been running for over 24 hours at this point with no apparent progress).

@chipbuster
Copy link
Contributor Author

Running under an external rr has crashed with an internal rr error.

It looks like this is an rr issue and not the actual Julia crash (the error message was about the wrong signal being received for a stop) but I have uploaded this trace in case it winds up being useful.

https://www.dropbox.com/s/ln805f9tfq3tsau/julia-rr-crash-nightly.tar.zst?dl=0

vtjnash added a commit that referenced this issue Mar 5, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

Fixes #44460
@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Mar 7, 2022
vtjnash added a commit that referenced this issue Mar 8, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.

Fixes #44460
KristofferC pushed a commit that referenced this issue Mar 11, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.

Fixes #44460

(cherry picked from commit c591bf2)
KristofferC pushed a commit that referenced this issue Apr 19, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.

Fixes #44460

(cherry picked from commit c591bf2)
@dpinol
Copy link

dpinol commented Apr 22, 2022

I get a crash also on my application when pushing into a array local variable within Threads.@thread.
Then, I tested the code in this issue description and it still crashes on my Ubuntu in both 1.8.0-beta 3 and julia built from master yesterday. @chipbuster can you reproduce the crash with current julia versions?

julia> versioninfo()
Julia Version 1.9.0-DEV.425
Commit 4340d23d28* (2022-04-21 08:06 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 32 on 12 virtual cores

This is what I get with valgrind on my application:


==915992== Thread 17:
==915992== Invalid read of size 8
==915992==    at 0x323D1C9A: jl_gc_state_set (julia_threads.h:340)
==915992==    by 0x323D1C9A: ijl_task_get_next (partr.c:593)
==915992==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

This is what I get without valgrind on my application with julia 1.9-dev:

signal (11): Segmentation fault
in expression starting at none:0
in expression starting at none:0
in expression starting at none:0
in expression starting at none:0
in expression starting at none:0
in expression starting at none:0
jl_gc_alloc at /home/dani/dev/julia/julia/usr/bin/../lib/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7fab2a7537d4)
ijl_process_events at /home/dani/dev/julia/julia/src/jl_uv.c:210
ijl_task_get_next at /home/dani/dev/julia/julia/src/partr.c:420
jl_array_grow_at_end at /home/dani/dev/julia/julia/src/array.c:893
_growend! at ./array.jl:1006 [inlined]
push! at ./array.jl:1053 [inlined]
.....
#1261#threadsfor_fun#248 at ./threadingconstructs.jl:84
#1261#threadsfor_fun at ./threadingconstructs.jl:52 [inlined]
#1 at ./threadingconstructs.jl:30
unknown function (ip: 0x7faa6da9bacf)
jl_mutex_unlock at /home/dani/dev/julia/julia/src/julia_locks.h:129 [inlined]
ijl_process_events at /home/dani/dev/julia/julia/src/jl_uv.c:217
ijl_task_get_next at /home/dani/dev/julia/julia/src/partr.c:420
ijl_task_get_next at /home/dani/dev/julia/julia/src/partr.c:420
jl_apply at /home/dani/dev/julia/julia/src/julia.h:1838 [inlined]
start_task at /home/dani/dev/julia/julia/src/task.c:931
Allocations: 46370570 (Pool: 46340026; Big: 30544); GC: 64

With Julia 1.8-beta3

signal (11): Segmentation fault
in expression starting at none:0
maybe_collect at /buildworker/worker/package_linux64/build/src/julia_threads.h:340 [inlined]
jl_gc_pool_alloc_inner at /buildworker/worker/package_linux64/build/src/gc.c:1228 [inlined]
jl_gc_pool_alloc_noinline at /buildworker/worker/package_linux64/build/src/gc.c:1287 [inlined]
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:368 [inlined]
jl_gc_alloc at /buildworker/worker/package_linux64/build/src/gc.c:3352
jl_gc_alloc_buf at /buildworker/worker/package_linux64/build/src/julia_internal.h:403 [inlined]
jl_reserve_excstack at /buildworker/worker/package_linux64/build/src/rtutils.c:335 [inlined]
jl_push_excstack at /buildworker/worker/package_linux64/build/src/rtutils.c:347
throw_internal at /buildworker/worker/package_linux64/build/src/task.c:621
ijl_sig_throw at /buildworker/worker/package_linux64/build/src/task.c:681

@KristofferC
Copy link
Member

KristofferC commented Apr 22, 2022

pushing into a array local variable within Threads.

Could you be a bit more explicit (perhaps with a code snippet)? As you describe it, that sounds like a data race.

@dpinol
Copy link

dpinol commented Apr 22, 2022

pushing into a array local variable within Threads.

Could you be a bit more explicit (perhaps with a code snippet)? As you describe it, that sounds like a data race.

The crash involves a local variable only accessed from a single thread. The snipped below illustrates what my application does, but julia never crashes with the snippet (nor valgrind reports any issue).

mutable struct S
    @atomic sum::Float32
end
const s=S(0)

for i in 1:100_000_0
    Threads.@threads for t in 1:300
        a=Float32[]
        for i in 1:rand(1:2000)
            push!(a, 0.3f0)
        end
        @atomic s.sum += a[end]
    end
end
@info "sum" s.sum

@dpinol
Copy link

dpinol commented Apr 22, 2022

I uploaded the result of running #44460 (comment) with julia 1.9-dev with --bug-julia report=rr-local --project=. --threads=32 (5Gb 😿 )
(link edited with the packed rr data)
https://www.dropbox.com/s/ia8b159h59x9uv4/issue-44460-rr.tgz?dl=0

@vtjnash
Copy link
Member

vtjnash commented Apr 22, 2022

It looks like you need to run rr pack before uploading that trace (should only make it slightly bigger)

@dpinol
Copy link

dpinol commented Apr 22, 2022

It looks like you need to run rr pack before uploading that trace (should only make it slightly bigger)
The packed data:https://www.dropbox.com/s/ia8b159h59x9uv4/issue-44460-rr.tgz?dl=0 thanks

@chipbuster
Copy link
Contributor Author

Yeah, the example snippet is not really the best because it generates a lot of stuff in rr and doesn't always crash quickly.

Something that I found sometimes persuades it to crash quicker is clearing all caches and buffer from Linux before starting a recording, but it's not really foolproof.

@vtjnash
Copy link
Member

vtjnash commented Apr 26, 2022

@dpinol there doesn't seem to be anything remarkable in your trace that I can see. The process runs for a while (about 4138 iterations of the loop happen), then something external terminated it with a SIGKILL (9).

@chipbuster
Copy link
Contributor Author

@dpinol I'm not able to reproduce the crash running with the latest Julia.

Based on the description provided by vtjnash, t's possible that my example is crashing for you simply by running out of memory.

@dpinol
Copy link

dpinol commented May 5, 2022

Yes, I agree in this case it runs out of memory. I created a different MWE at #45196. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants