Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault: while calling gotoblas_init() -> blas_thread_init() -> pthread_create@GLIBC_2.2.5 #888

Closed
sunlylorn opened this issue May 20, 2016 · 17 comments

Comments

@sunlylorn
Copy link

OpenBLAS-0.2.18

coredump:

Program terminated with signal 11, Segmentation fault
#0 0x0000003161a06fb6 in pthread_create@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1 0x00007eff2bb6825f in blas_thread_init () from ../lib/libopenblas.so.0
#2 0x00007eff2bb675d7 in gotoblas_init () from ../lib/libopenblas.so.0
#3 0x00007eff2c6318d6 in __do_global_ctors_aux () from ../lib/libopenblas.so.0
#4 0x00007eff2b945beb in _init () from ../lib/libopenblas.so.0
#5 0x0000003100000000 in ?? ()
#6 0x0000003160e0e4a5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#7 0x0000003160e12ba5 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8 0x0000003160e0e106 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x0000003160e123ea in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x0000003161200f66 in dlopen_doit () from /lib64/libdl.so.2
#11 0x0000003160e0e106 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x000000316120129c in _dlerror_run () from /lib64/libdl.so.2
#13 0x0000003161200ee1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2

@sunlylorn
Copy link
Author

@xianyi

@martin-frbg
Copy link
Collaborator

I suspect more context will be necessary to allow assessing this problem - surely if OpenBLAS was just crashing like that for everybody, few if any of us would use it. So, could you please state : running on what system (64bit Linux apparently, but what processor, what distribution?) and what is the general context in which the error occurs - some program you wrote yourself, or some well-known system like R or Octave ? Is there some minimal code example that can be used to reproduce the error ?

@brada4
Copy link
Contributor

brada4 commented May 21, 2016

Please show
$ gdb ./your_executable
gdb> run --arguments -to +your --program
gdb> t a a bt

And if you can you drill program through
strace -f -o 'traces' (your program command line)

And upload whatever reports you get to gist and link here

@sunlylorn
Copy link
Author

this problem can not be reproduced, sometimes it will init all OK, but sometimes it will generate coredump. so i suppose it use some unthreadsafe functions which may cause this problem?

@sunlylorn
Copy link
Author

i use openBLAS to create a dynatic library named libxxx.so, and i use a frame to load this libxxx.so, this problem occured at the time when the frame try to load libxxx.so
screenshot-3

@brada4
Copy link
Contributor

brada4 commented May 23, 2016

Please remove -march=native when compiling. Your GDB is too old for that. If you use other threads than in openblas please use thread-safe OpenMP openblas build.

@martin-frbg
Copy link
Collaborator

I suspect this may have been the same issue as #716, where initialization of each thread involved an unsafe call to getenv() - while the fix for that bug appears to have been in the development tree by the time this issue was opened, the first release to contain it was 0.2.19.
If my assumption is correct, updating your OpenBLAS to this (still current) release version or to a snapshot of the git "develop" branch should be sufficient (if this issue is still relevant for you).

@ctgushiwei
Copy link

I can compiled 0.2.19 release version successfully on armv7 ,but when i test cblas_sgemm,i go to segmentation error. i have known the reason,the code at openblas_0.2.19/kernel/ can not compile to .o file,i do not know how to solve this problem

@knedlsepp
Copy link

@martin-frbg: I've tried both the OpenBLAS 0.2.19 and 0.2.20 based numpy from conda-forge and both versions still seem to have this problem.

@knedlsepp
Copy link

knedlsepp commented Sep 1, 2017

Just compiled the current develop branch using:

cmake -DCMAKE_C_FLAGS="-fsanitize=thread -pie" -DCMAKE_BUILD_TYPE=Debug .. 
make -j 4
env CTEST_OUTPUT_ON_FAILURE=1 make test

I get the following globals for blas_thread_init that are affected by race conditions:

  • thread_status
  • hot_alloc
  • memory

I'm not an expert on multithreading, so I don't know how to fix the problem yet, but as the ThreadSanitizer is quite specific about where the problem lies we should be able to patch this via a joint effort.

==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Read of size 8 at 0x7f01386fc980 by thread T2:
    #0 blas_lock /tmp/OpenBLAS/common_x86_64.h:75 (libopenblas_d.so.0+0x000000a45fac)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1079 (libopenblas_d.so.0+0x000000a45fac)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Previous write of size 8 at 0x7f01386fc980 by thread T1:
    #0 blas_unlock /tmp/OpenBLAS/common.h:661 (libopenblas_d.so.0+0x000000a45ff1)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1083 (libopenblas_d.so.0+0x000000a45ff1)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'memory' of size 512 at 0x7f01386fc980 (libopenblas_d.so.0+0x0000010ad980)

  Thread T2 (tid=20604, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T1 (tid=20603, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/common_x86_64.h:75 blas_lock
==================
==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Read of size 8 at 0x7f01386fc9c0 by thread T2:
    #0 blas_lock /tmp/OpenBLAS/common_x86_64.h:75 (libopenblas_d.so.0+0x000000a45fac)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1079 (libopenblas_d.so.0+0x000000a45fac)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Previous write of size 8 at 0x7f01386fc9c0 by thread T1:
    #0 blas_unlock /tmp/OpenBLAS/common.h:661 (libopenblas_d.so.0+0x000000a4609b)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1100 (libopenblas_d.so.0+0x000000a4609b)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'memory' of size 512 at 0x7f01386fc980 (libopenblas_d.so.0+0x0000010ad9c0)

  Thread T2 (tid=20604, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T1 (tid=20603, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/common_x86_64.h:75 blas_lock
==================
==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Read of size 4 at 0x7f01386fc9d0 by thread T2:
    #0 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1081 (libopenblas_d.so.0+0x000000a45fc7)
    #1 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #2 <null> <null> (libtsan.so.0+0x000000022709)

  Previous write of size 4 at 0x7f01386fc9d0 by thread T1:
    #0 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1098 (libopenblas_d.so.0+0x000000a46088)
    #1 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #2 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'memory' of size 512 at 0x7f01386fc980 (libopenblas_d.so.0+0x0000010ad9d0)

  Thread T2 (tid=20604, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T1 (tid=20603, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/memory.c:1081 blas_memory_alloc
==================
==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Read of size 4 at 0x7f01386fcbc8 by thread T3:
    #0 alloc_mmap /tmp/OpenBLAS/driver/others/memory.c:527 (libopenblas_d.so.0+0x000000a45ac1)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1114 (libopenblas_d.so.0+0x000000a4611b)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Previous write of size 4 at 0x7f01386fcbc8 by thread T1:
    #0 alloc_mmap /tmp/OpenBLAS/driver/others/memory.c:597 (libopenblas_d.so.0+0x000000a45cd6)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1114 (libopenblas_d.so.0+0x000000a4611b)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4688c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'hot_alloc' of size 4 at 0x7f01386fcbc8 (libopenblas_d.so.0+0x0000010adbc8)

  Thread T3 (tid=20605, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T1 (tid=20603, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/memory.c:527 alloc_mmap
==================
TEST 1/2 amax:samax [OK]
TEST 2/2 potrf:bug_695 ==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Write of size 8 at 0x7f01386fce00 by thread T2 (mutexes: write M15):
    #0 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:364 (libopenblas_d.so.0+0x000000a46972)
    #1 <null> <null> (libtsan.so.0+0x000000022709)

  Previous read of size 8 at 0x7f01386fce00 by main thread:
    #0 exec_blas_async /tmp/OpenBLAS/driver/others/blas_server.c:672 (libopenblas_d.so.0+0x000000a46fdc)
    #1 exec_blas /tmp/OpenBLAS/driver/others/blas_server.c:803 (libopenblas_d.so.0+0x000000a4732c)
    #2 gemm_thread_n /tmp/OpenBLAS/driver/level3/gemm_thread_n.c:95 (libopenblas_d.so.0+0x000000944795)
    #3 cpotrf_U_parallel /tmp/OpenBLAS/lapack/potrf/potrf_U_parallel.c:112 (libopenblas_d.so.0+0x000000cd1cff)
    #4 cpotrf_ /tmp/OpenBLAS/interface/lapack/zpotrf.c:125 (libopenblas_d.so.0+0x0000008003ac)
    #5 __ctest_potrf_bug_695_run /tmp/OpenBLAS/utest/test_potrs.c:391 (openblas_utest+0x0000000030f5)
    #6 ctest_main /tmp/OpenBLAS/utest/ctest.h:724 (openblas_utest+0x000000002e9d)
    #7 main /tmp/OpenBLAS/utest/utest_main.c:45 (openblas_utest+0x00000000184e)

  Location is global 'thread_status' of size 512 at 0x7f01386fcd80 (libopenblas_d.so.0+0x0000010ade00)

  Mutex M15 (0x7f01386fce18) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x000000026fc5)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:572 (libopenblas_d.so.0+0x000000a46d23)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T2 (tid=20604, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/blas_server.c:364 blas_thread_server
==================
==================
WARNING: ThreadSanitizer: data race (pid=20601)
  Write of size 8 at 0x7f01386fce80 by main thread (mutexes: write M13):
    #0 blas_thread_shutdown_ /tmp/OpenBLAS/driver/others/blas_server.c:965 (libopenblas_d.so.0+0x000000a47918)
    #1 blas_shutdown /tmp/OpenBLAS/driver/others/memory.c:1252 (libopenblas_d.so.0+0x000000a46321)
    #2 gotoblas_quit /tmp/OpenBLAS/driver/others/memory.c:1455 (libopenblas_d.so.0+0x00000008702e)
    #3 _dl_fini <null> (ld-linux-x86-64.so.2+0x00000000f8e6)

  Previous read of size 8 at 0x7f01386fce80 by thread T3 (mutexes: write M16):
    #0 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:316 (libopenblas_d.so.0+0x000000a468cb)
    #1 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'thread_status' of size 512 at 0x7f01386fcd80 (libopenblas_d.so.0+0x0000010ade80)

  Mutex M13 (0x7f01386fcfa0) created at:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x0000000342d6)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:556 (libopenblas_d.so.0+0x000000a46bfc)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Mutex M16 (0x7f01386fce98) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x000000026fc5)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:572 (libopenblas_d.so.0+0x000000a46d23)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T3 (tid=20605, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d42)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1373 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1416 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/blas_server.c:965 blas_thread_shutdown_

@martin-frbg
Copy link
Collaborator

Back in early january when I tried to fix these issues (PR #1052) I was left with a few cases where two threads would conflict, with each holding a different lock. Maybe this is what is hurting you now - this definitely needs a revisit but alas my (poor) multithreading expertise has not improved in the meantime.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 2, 2017

If you want to try, the attached diff seems to fix most these, leaving "only" the known problems with blas_level3_thread that can be worked around by building with USE_SIMPLE_THREADED_LEVEL3=1.
(Update: fixed the attached diff to have only the relevant changes)
threaddiff.txt

@knedlsepp
Copy link

knedlsepp commented Sep 2, 2017

Thanks for the patch! I'll give it a try on the problematic box.
Just to recap: As I understand from your patch there are at least two issues at hand:

  • blas_lock does not work as expected
  • Even though pthread-support is on, blas_lock is used instead of a pthread-lock, because blas_lock is hard-coded in some places, where the LOCK_COMMAND macro should have been used

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 2, 2017

That is my impression at least - in particular, blas_lock() seems to be nothing more than YIELDING - i.e. either sched_yield or even just asm(NOP) so it is unclear to me how that would be expected to provide reliable protection against races. Possibly it used to have a different implementation in the early days, and/or its misleading name led to misapplication in later edits.
perhaps the issue is simply that mixing pthread mutex locks with the blas_lock implementation does not work
EDIT: sorry, seems I uploaded a full diff of my working tree by mistake, instead of just the few changes
to memory.c and blas_server.c. fixed now

@knedlsepp
Copy link

After applying your patch there seems to be only one race condition left when running the openblas_utest binary:

WARNING: ThreadSanitizer: data race (pid=24748)
  Write of size 4 at 0x7f266d8f5bc8 by thread T1:
    #0 alloc_mmap /tmp/OpenBLAS/driver/others/memory.c:615 (libopenblas_d.so.0+0x000000a45cd6)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1132 (libopenblas_d.so.0+0x000000a46105)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4687c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Previous read of size 4 at 0x7f266d8f5bc8 by thread T2:
    #0 alloc_mmap /tmp/OpenBLAS/driver/others/memory.c:542 (libopenblas_d.so.0+0x000000a45ac1)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1132 (libopenblas_d.so.0+0x000000a46105)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4687c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Location is global 'hot_alloc' of size 4 at 0x7f266d8f5bc8 (libopenblas_d.so.0+0x0000010adbc8)

  Thread T1 (tid=24750, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d32)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1400 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1446 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T2 (tid=24751, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d32)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1400 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1446 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/memory.c:615 alloc_mmap

So that's good news. I tried getting rid of this last one by guarding the entire body of alloc_mmap with the LOCK_COMMAND(&alloc_lock); this doesn't work however.
lock_alloc_mmap.diff.txt
I get a slightly different error now:

==================
WARNING: ThreadSanitizer: data race (pid=26722)
  Write of size 4 at 0x7f18543b4bc8 by thread T1 (mutexes: write M12):
    #0 alloc_mmap /tmp/OpenBLAS/driver/others/memory.c:616 (libopenblas_d.so.0+0x000000a45cd6)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1132 (libopenblas_d.so.0+0x000000a460e5)
    #2 blas_thread_server /tmp/OpenBLAS/driver/others/blas_server.c:297 (libopenblas_d.so.0+0x000000a4685c)
    #3 <null> <null> (libtsan.so.0+0x000000022709)

  Previous read of size 4 at 0x7f18543b4bc8 by main thread:
    [failed to restore the stack]

  Location is global 'hot_alloc' of size 4 at 0x7f18543b4bc8 (libopenblas_d.so.0+0x0000010adbc8)

  Mutex M12 (0x7f18543b4ba0) created at:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x0000000342d6)
    #1 blas_memory_alloc /tmp/OpenBLAS/driver/others/memory.c:1022 (libopenblas_d.so.0+0x000000a45f33)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1393 (libopenblas_d.so.0+0x000000094d37)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1446 (libopenblas_d.so.0+0x000000094d37)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

  Thread T1 (tid=26724, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x0000000265c4)
    #1 blas_thread_init /tmp/OpenBLAS/driver/others/blas_server.c:579 (libopenblas_d.so.0+0x000000a46d12)
    #2 gotoblas_memory_init /tmp/OpenBLAS/driver/others/memory.c:1400 (libopenblas_d.so.0+0x000000094dc2)
    #3 gotoblas_init /tmp/OpenBLAS/driver/others/memory.c:1446 (libopenblas_d.so.0+0x000000094dc2)
    #4 call_init.part.0 <null> (ld-linux-x86-64.so.2+0x00000000f329)

SUMMARY: ThreadSanitizer: data race /tmp/OpenBLAS/driver/others/memory.c:616 alloc_mmap

This actually makes me wonder: Is it correct to reuse the same lock for different usages/variables? I'm not sure if that can be harmful (deadlock?) but at least it could be inefficient to lock non-interacting code paths with the same lock.

@martin-frbg
Copy link
Collaborator

My patch above was actually incomplete, missing a very similar code fragment a few lines from the changed part of memory.c. With the completed fix I merged today and
USE_SIMPLE_THREADED_LEVEL3 = 1
as a temporary workaround for the known issues with level3 BLAS, I do not get thread sanitizer warnings for the builtin tests in non-OPENMP builds anymore. (With USE_OPENMP, I am not sure if thread sanitizer warnings are valid - valgrind/helgrind at least seem to be misled by the OPENMP locking mechanism)

@knedlsepp
Copy link

@martin-frbg: That's great to hear! Thanks for your help. I'm looking forward to having a working conda numpy again. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants