Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthread_create: Invalid argument when starting sage with openblas preloaded #1936

Closed
timokau opened this issue Dec 26, 2018 · 28 comments
Closed

Comments

@timokau
Copy link

timokau commented Dec 26, 2018

I maintain sage for NixOS. sage is a software package that combines various math software such as gap, R, maxima etc (many of which use openblas). I've recently because of an unrelated issue taken to adding openblas to LD_PRELOAD to make sure the right version of blas is loaded.

Since openblas 0.3.4, starting sage immediately fails with pthread_create: Invalid argument when started with openblas preloaded (and only then). More specifically, this git-bisect shows issue was introduced 0427277.

LD_DEBUG=libs sage 2>&1 | grep 'calling init|pthread_create' gives the following result: http://sprunge.us/3itk1W

I'm not sure what part of the sage startup exactly is triggering this issue.

@brada4
Copy link
Contributor

brada4 commented Dec 27, 2018

Does it still fail with
? setting OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1
? without LD_PRELOAD

At the point OpenBLAS is loaded 4 times there is no python module loading OpenMP (libgomp)

@timokau
Copy link
Author

timokau commented Dec 27, 2018 via email

@brada4
Copy link
Contributor

brada4 commented Dec 27, 2018

Is that a question?

Just an observation,
how big is your system cpu (like nproc)?

What does LD_DEBUG show without LD_preload down to the point numpy gets imported? BLAS is not in default imports of python, it comes only with extra modules normally.

Can you try strace in same conditions, that would show also pthread_create calls along the lines?

LD_PRELOAD is somewhere around lines of last chance try, like overriding unknowingly linked netlib BLAS in some PIP install, not a typical use case, at least library is not very resistant to dlopen() in fast sequence.

@timokau
Copy link
Author

timokau commented Dec 27, 2018

how big is your system cpu (like nproc)

4 cores (Intel i5-3470)

What does LD_DEBUG show without LD_preload down to the point numpy gets imported

(Just importing numpy in regular python doesn't cause the issue)
I'm now using LD_DEBUG with sage -c 'print(42)' instead of just sage, which creates a little bit shorter logs (because stuff like readline isn't needed). Here is the log with preload and here without preload.

Can you try strace in same conditions, that would show also pthread_create calls along the lines?

Am I using strace wrong? I thought it only shows syscalls. strace ... | grep 'pthread\|openblas' shows:

openat(AT_FDCWD, "/nix/store/n9f0nvslmkkmgxyjy73y0sgr0v48zhyx-openblas-0.3.4/lib/libopenblas.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/m2xy46v7cczsc06n4c2l08riwkfs6skx-sane-config/lib/sane/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/opengl-driver/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/xdsjx0gba4id3yyqxv66bxnm2sqixkjj-glibc-2.27/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/n9f0nvslmkkmgxyjy73y0sgr0v48zhyx-openblas-0.3.4/lib/libopenblas.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/m2xy46v7cczsc06n4c2l08riwkfs6skx-sane-config/lib/sane/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/opengl-driver/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/w30zkhdjf6qp4ppx9xzffsh9dic31aq8-python-2.7.15/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/xdsjx0gba4id3yyqxv66bxnm2sqixkjj-glibc-2.27/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
write(2, "pthread_create: Invalid argument"..., 33pthread_create: Invalid argument                                               

@martin-frbg
Copy link
Collaborator

I wonder why it gets (pre)loaded four times, probably running out of space for the thread-local array. Also did you build OpenBLAS with or without OpenMP ?

@timokau
Copy link
Author

timokau commented Dec 27, 2018

I wonder why it gets (pre)loaded four times

I'd guess that it gets loaded once for every math library sage uses, but I don't know exactly how dynamic linking works

Also did you build OpenBLAS with or without OpenMP

With. Should I try it without?

@timokau
Copy link
Author

timokau commented Dec 27, 2018

Same issue when compiled without OpenMP.

@martin-frbg
Copy link
Collaborator

Can you find out where the pthread_create fails ? (I assume this is not a message from OpenBLAS itself, as the pthread_create call in blas_server.c should generate additional output on failure. It may be helpful to understand which thread attribute is conflicting with the use of a thread_local array, as I understand it a simple "out of stack space" condition would lead to EAGAIN rather than EINVAL).
A workaround if you absolutely need to use preloading could be to build OpenBLAS with USE_SIMPLE_THREADED_LEVEL3=1

@timokau
Copy link
Author

timokau commented Dec 27, 2018

Do you know an easy way to do that? ltrace and gdb both only work with elf binaries and LD_DEBUG doesn't give that information.

@martin-frbg
Copy link
Collaborator

Not sure I understand what the limitation is - is sage a shell script ? (And even if it is, can't you modify it to call gdb someprogram instead of someprogram for whatever it tries to execute and see if you can get a breakpoint and subsequent backtrace on the pthread_create that caused the error message to be printed ?) I simply do not think that this is OpenBLAS itself where pthread creation fails, but some other code invoked by sage (and I do not see how use of thread local storage in OpenBLAS would cause EINVAL on thread creation by that other code)

@timokau
Copy link
Author

timokau commented Dec 27, 2018

It is a pretty big python application, so no obvious place to modify someprogram to gdb someprogram.

@timokau
Copy link
Author

timokau commented Dec 27, 2018

Nevermind looks like debugging python is actually possible.

@timokau
Copy link
Author

timokau commented Dec 28, 2018

Turns out you can even ltrace a shell script if you just ltrace the shell instead (for some reason doesn't work with shebangs, but does work if you pass it to the shell explicitly).

$ ltrace -e 'pthread_create' bash result/bin/sage -c 'print(42)'
--- Called exec() ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
--- Called exec() ---
--- Called exec() ---
--- SIGCHLD (Child exited) ---
signals.so->pthread_create(0x7fff1ad71e08, 0x7fff1ad71e10, 0x7f7fa878a010, 0) = 0

signals.so belongs to cysignals which only has one occurrence of pthread_create.

@timokau
Copy link
Author

timokau commented Dec 28, 2018

@jdemeyer do you have any idea why that pthread_create may fail when openblas with 0427277 is already loaded?

@martin-frbg
Copy link
Collaborator

Seems to me that cysignals wants to prescribe a specific stack size (and alignment) for its thread that may be at odds with OpenBLAS' earlier request (now that 0427277 added thread-local storage to fix a nasty reentrancy bug).

timokau added a commit to timokau/nixpkgs that referenced this issue Dec 28, 2018
This reverts commit 4900bbe.

The issue that was supposed to fix is now fixed by lazy-loading rpy2 and
makign sure scipy is loaded before that.
That is not quite as nice, but preloading is now causing its own issues
with openblas 0.3.4:
OpenMathLib/OpenBLAS#1936
@jdemeyer
Copy link

jdemeyer commented Jan 4, 2019

@jdemeyer do you have any idea why that pthread_create may fail when openblas with 0427277 is already loaded?

Unfortunately not. I'm pretty sure that I could easily fix it once I know what the problem is.

@martin-frbg
Copy link
Collaborator

All I can say is that OpenBLAS threads are created with default attributes (NULL attr), and the only change in the implicated commit is that a local array of 1024 doubles was declared thread_local (where supported by the OS environment). Perhaps stepping into the pthread_create function in gdb would provide additional insight into the exact cause of the blanket EINVAL (if it was simply running out of memory or other resources I'd expect it to return EAGAIN).

@jdemeyer
Copy link

jdemeyer commented Jan 4, 2019

@timokau Do you know a way to reproduce this with vanilla Sage built from source (not on NixOS)?

@timokau
Copy link
Author

timokau commented Jan 4, 2019

@jdemeyer probably adding sage's openblas to LD_PRELOAD should do it. I'll test if it does, but it may take a while since I don't have a compiled vanilla sage laying around right now.

@jdemeyer
Copy link

jdemeyer commented Jan 4, 2019

Yes, confirmed in Sage with OpenBLAS 0.3.5

@jdemeyer
Copy link

Fixed the problem by using a larger thread stack for cysignals (64k was not enough but 128k works). I don't know why openblas suddenly needs such a large thread stack, especially given that PTHREAD_STACK_MIN is only 16k.

jdemeyer added a commit to sagemath/cysignals that referenced this issue Feb 12, 2019
@martin-frbg
Copy link
Collaborator

The variable that has been made thread_local by the change is an array of 1024 doubles (i.e. 8k), so I do not yet understand why it would have such a large effect.

@martin-frbg
Copy link
Collaborator

..and I wonder if it would make sense to call pthread_attr_getstack or pthread_attr_getstacksize after the
pthread_attr_init to determine the current default size before attempting to change it (or would this return PTHREAD_STACK_MIN regardless ?) ?

@jdemeyer
Copy link

The default stack (as determined by pthread_attr_getstacksize) is 8M, much larger than what I use in cysignals. But I only need to run a single tiny function by that thread, which surely needs less than 1k of stack.

@martin-frbg
Copy link
Collaborator

I must admit that it is not even clear to me why the stack size as requested by prior OpenBLAS threads has any bearing on the threads you attempt to set up in cysignals. EINVAL suggests that "it" specifically resents your attempt to request less than what the others already have. (And it seems to apply only in the special case of LD_PRELOAD). Perhaps it is somehow related to the fact that you try to set both size and location ?

@jdemeyer
Copy link

I have no idea... but it works now by using a larger stack.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 27, 2019

Some more digging turned up an unresolved glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=11787 that was opened nine years ago, and a workaround comitted in the ruby project https://github.com/ruby/ruby/blob/19d692920d2d207c3aa891fc79aa5a93c17f84c6/thread_pthread.c#L1649
EDIT: rust-lang/rust#6233 is probably a better example than the ruby one.

@martin-frbg
Copy link
Collaborator

Believed to be fixed by #2879

kliem pushed a commit to sagemath/memory_allocator that referenced this issue May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants