Skip to content

Strange race condition while running conjugate gradients in multithreaded code #36297

@ranjanan

Description

@ranjanan

I've been trying to debug the following race condition for a while now with little success, and I'm posting here just in case this unconvers a bug in the threading runtime.

Here's how to reproduce it on Julia 1.4 (same behaviour on master):

  1. Add the package Circuitscape with the branch "RA/mt"
  2. Start Julia with multiple threads

Then run:

foo = Vector{Any}()
buzz = Ref{Any}()
using Base.Threads
@show nthreads() # shows threads
using Circuitscape
cd("~/.julia/dev/Circuitscape/test")
compute(test_problem("sgVerify1.ini"))

Now, if you enter fetch.(buzz[]) at the REPL, you should see a few vectors, some of which have NaN's in them, which is incorrect. If you do the above steps but with single threaded julia, you should see proper floating point values, which are expected.

I've tried to obtain an rr recording as well, but for some puzzling reason, I cannot reproduce this race condition using rr. The command I've been running:
JULIA_NUM_THREADS=4 /opt/rr/bin/rr record --chaos --num-cores=4 /data/ranjanan/julia/usr/bin/julia-debug

Any help with this to understand the underlying issue would be greatly appreciated.

cc: @Keno @vchuravy

Metadata

Metadata

Assignees

No one assigned

    Labels

    multithreadingBase.Threads and related functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions