-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault: while calling gotoblas_init() -> blas_thread_init() -> pthread_create@GLIBC_2.2.5 #888
Comments
I suspect more context will be necessary to allow assessing this problem - surely if OpenBLAS was just crashing like that for everybody, few if any of us would use it. So, could you please state : running on what system (64bit Linux apparently, but what processor, what distribution?) and what is the general context in which the error occurs - some program you wrote yourself, or some well-known system like R or Octave ? Is there some minimal code example that can be used to reproduce the error ? |
Please show And if you can you drill program through And upload whatever reports you get to gist and link here |
this problem can not be reproduced, sometimes it will init all OK, but sometimes it will generate coredump. so i suppose it use some unthreadsafe functions which may cause this problem? |
Please remove -march=native when compiling. Your GDB is too old for that. If you use other threads than in openblas please use thread-safe OpenMP openblas build. |
I suspect this may have been the same issue as #716, where initialization of each thread involved an unsafe call to getenv() - while the fix for that bug appears to have been in the development tree by the time this issue was opened, the first release to contain it was 0.2.19. |
I can compiled 0.2.19 release version successfully on armv7 ,but when i test cblas_sgemm,i go to segmentation error. i have known the reason,the code at openblas_0.2.19/kernel/ can not compile to .o file,i do not know how to solve this problem |
@martin-frbg: I've tried both the OpenBLAS 0.2.19 and 0.2.20 based numpy from conda-forge and both versions still seem to have this problem. |
Just compiled the current
I get the following globals for
I'm not an expert on multithreading, so I don't know how to fix the problem yet, but as the ThreadSanitizer is quite specific about where the problem lies we should be able to patch this via a joint effort.
|
Back in early january when I tried to fix these issues (PR #1052) I was left with a few cases where two threads would conflict, with each holding a different lock. Maybe this is what is hurting you now - this definitely needs a revisit but alas my (poor) multithreading expertise has not improved in the meantime. |
If you want to try, the attached diff seems to fix most these, leaving "only" the known problems with blas_level3_thread that can be worked around by building with USE_SIMPLE_THREADED_LEVEL3=1. |
Thanks for the patch! I'll give it a try on the problematic box.
|
That is my impression at least - |
After applying your patch there seems to be only one race condition left when running the
So that's good news. I tried getting rid of this last one by guarding the entire body of
This actually makes me wonder: Is it correct to reuse the same lock for different usages/variables? I'm not sure if that can be harmful (deadlock?) but at least it could be inefficient to lock non-interacting code paths with the same lock. |
My patch above was actually incomplete, missing a very similar code fragment a few lines from the changed part of memory.c. With the completed fix I merged today and |
@martin-frbg: That's great to hear! Thanks for your help. I'm looking forward to having a working conda numpy again. :-) |
OpenBLAS-0.2.18
coredump:
Program terminated with signal 11, Segmentation fault
#0 0x0000003161a06fb6 in pthread_create@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1 0x00007eff2bb6825f in blas_thread_init () from ../lib/libopenblas.so.0
#2 0x00007eff2bb675d7 in gotoblas_init () from ../lib/libopenblas.so.0
#3 0x00007eff2c6318d6 in __do_global_ctors_aux () from ../lib/libopenblas.so.0
#4 0x00007eff2b945beb in _init () from ../lib/libopenblas.so.0
#5 0x0000003100000000 in ?? ()
#6 0x0000003160e0e4a5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#7 0x0000003160e12ba5 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8 0x0000003160e0e106 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x0000003160e123ea in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x0000003161200f66 in dlopen_doit () from /lib64/libdl.so.2
#11 0x0000003160e0e106 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x000000316120129c in _dlerror_run () from /lib64/libdl.so.2
#13 0x0000003161200ee1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
The text was updated successfully, but these errors were encountered: