-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pthread_create: Invalid argument
when starting sage with openblas preloaded
#1936
Comments
Does it still fail with At the point OpenBLAS is loaded 4 times there is no python module loading OpenMP (libgomp) |
On 18-12-26 23:04, Andrew wrote:
Does it still fail with
? setting OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1
Yes, still fails.
? without LD_PRELOAD
No, without `LD_PRELOAD` it doesn't fail.
At the point OpenBLAS is loaded 4 times there is no python module loading OpenMP (libgomp)
Is that a question?
… --
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#1936 (comment)
|
Just an observation, What does LD_DEBUG show without LD_preload down to the point numpy gets imported? BLAS is not in default imports of python, it comes only with extra modules normally. Can you try strace in same conditions, that would show also pthread_create calls along the lines? LD_PRELOAD is somewhere around lines of last chance try, like overriding unknowingly linked netlib BLAS in some PIP install, not a typical use case, at least library is not very resistant to dlopen() in fast sequence. |
4 cores (Intel i5-3470)
(Just importing numpy in regular python doesn't cause the issue)
Am I using strace wrong? I thought it only shows syscalls.
|
I wonder why it gets (pre)loaded four times, probably running out of space for the thread-local array. Also did you build OpenBLAS with or without OpenMP ? |
I'd guess that it gets loaded once for every math library sage uses, but I don't know exactly how dynamic linking works
With. Should I try it without? |
Same issue when compiled without OpenMP. |
Can you find out where the pthread_create fails ? (I assume this is not a message from OpenBLAS itself, as the pthread_create call in blas_server.c should generate additional output on failure. It may be helpful to understand which thread attribute is conflicting with the use of a thread_local array, as I understand it a simple "out of stack space" condition would lead to EAGAIN rather than EINVAL). |
Do you know an easy way to do that? |
Not sure I understand what the limitation is - is |
It is a pretty big python application, so no obvious place to modify |
Nevermind looks like debugging python is actually possible. |
Turns out you can even
|
Seems to me that cysignals wants to prescribe a specific stack size (and alignment) for its thread that may be at odds with OpenBLAS' earlier request (now that 0427277 added thread-local storage to fix a nasty reentrancy bug). |
This reverts commit 4900bbe. The issue that was supposed to fix is now fixed by lazy-loading rpy2 and makign sure scipy is loaded before that. That is not quite as nice, but preloading is now causing its own issues with openblas 0.3.4: OpenMathLib/OpenBLAS#1936
All I can say is that OpenBLAS threads are created with default attributes (NULL attr), and the only change in the implicated commit is that a local array of 1024 doubles was declared thread_local (where supported by the OS environment). Perhaps stepping into the pthread_create function in gdb would provide additional insight into the exact cause of the blanket EINVAL (if it was simply running out of memory or other resources I'd expect it to return EAGAIN). |
@timokau Do you know a way to reproduce this with vanilla Sage built from source (not on NixOS)? |
@jdemeyer probably adding sage's openblas to |
Yes, confirmed in Sage with OpenBLAS 0.3.5 |
Fixed the problem by using a larger thread stack for cysignals (64k was not enough but 128k works). I don't know why openblas suddenly needs such a large thread stack, especially given that |
This fixes OpenMathLib/OpenBLAS#1936 somehow
The variable that has been made thread_local by the change is an array of 1024 doubles (i.e. 8k), so I do not yet understand why it would have such a large effect. |
..and I wonder if it would make sense to call pthread_attr_getstack or pthread_attr_getstacksize after the |
The default stack (as determined by |
I must admit that it is not even clear to me why the stack size as requested by prior OpenBLAS threads has any bearing on the threads you attempt to set up in cysignals. EINVAL suggests that "it" specifically resents your attempt to request less than what the others already have. (And it seems to apply only in the special case of LD_PRELOAD). Perhaps it is somehow related to the fact that you try to set both size and location ? |
I have no idea... but it works now by using a larger stack. |
Some more digging turned up an unresolved glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=11787 that was opened nine years ago, and a workaround comitted in the ruby project https://github.com/ruby/ruby/blob/19d692920d2d207c3aa891fc79aa5a93c17f84c6/thread_pthread.c#L1649 |
Believed to be fixed by #2879 |
This fixes OpenMathLib/OpenBLAS#1936 somehow
I maintain sage for NixOS. sage is a software package that combines various math software such as gap, R, maxima etc (many of which use openblas). I've recently because of an unrelated issue taken to adding openblas to LD_PRELOAD to make sure the right version of blas is loaded.
Since openblas 0.3.4, starting sage immediately fails with
pthread_create: Invalid argument
when started with openblas preloaded (and only then). More specifically, this git-bisect shows issue was introduced 0427277.LD_DEBUG=libs sage 2>&1 | grep 'calling init|pthread_create'
gives the following result: http://sprunge.us/3itk1WI'm not sure what part of the sage startup exactly is triggering this issue.
The text was updated successfully, but these errors were encountered: