Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zmq::thread_t::applySchedulingParameters() crash with musl-libc. #3162

Closed
ilue opened this issue Jun 12, 2018 · 10 comments
Closed

zmq::thread_t::applySchedulingParameters() crash with musl-libc. #3162

ilue opened this issue Jun 12, 2018 · 10 comments

Comments

@ilue
Copy link
Contributor

ilue commented Jun 12, 2018

Issue description

zmq::thread_t::applySchedulingParameters() call to some pthread functions using the descriptor member, which may be uninitialized if the new thread start before pthread_create return (musl-libc's pthread_create set the descriptor after starting the thread, while glibc just works beacuse the descriptor was set before the thread starts). May be use pthread_set() instead for compatibility?

Environment

  • libzmq version (commit hash if unreleased): 4.2.5
  • OS: linux (with musl enabled gcc)
  • libzmq configure arguments: --build=x86_64-linux-gnu --host=x86_64-unknown-linux-musl --enable-static --disable-shared --disable-libunwind --without-docs CXXFLAGS='-static' LDFLAGS='--static'

Minimal test code / Steps to reproduce the issue

#include <zmq.h>

int main(int argc, char* argv[])
{
    void* ctx = zmq_ctx_new();
    void* s = zmq_socket(ctx, ZMQ_PUB);
    zmq_close(s);
    zmq_ctx_term(ctx);
    return 0;
}
compile with:
x86_64-unknown-linux-musl-g++ -static -Wall -O3 -g -I ~/zeromq-4.2.5/include test.cpp -L ~/zeromq-4.2.5/src/.libs -lzmq

What's the actual result? (include assertion message & call stack if applicable)

SIGSEGV.

#0  0x000000000050e4e5 in a_cas (s=-2147483647, t=0, p=0xd0)
    at /home/ilue/crosstool-ng/.build/x86_64-unknown-linux-musl/src/musl/arch/x86_64/atomic_arch.h:4
#1  __lock (l=l@entry=0xd0) at /home/ilue/crosstool-ng/.build/x86_64-unknown-linux-musl/src/musl/src/thread/__lock.c:23
#2  0x000000000050f38b in pthread_getschedparam (t=0x0, policy=0x7ffff7ff9d7c, param=0x7ffff7ff9d40)
    at /home/ilue/crosstool-ng/.build/x86_64-unknown-linux-musl/src/musl/src/thread/pthread_getschedparam.c:6
#3  0x0000000000436f68 in zmq::thread_t::applySchedulingParameters() ()
#4  0x0000000000436d6c in thread_routine ()
#5  0x000000000050ee87 in start (p=0x7ffff7ff9ee8)
    at /home/ilue/crosstool-ng/.build/x86_64-unknown-linux-musl/src/musl/src/thread/pthread_create.c:150
#6  0x000000000050fb73 in __clone ()
    at /home/ilue/crosstool-ng/.build/x86_64-unknown-linux-musl/src/musl/src/thread/x86_64/clone.s:21
#7  0x0000000000000001 in ?? ()
#8  0x00007ffff7ff9ed8 in ?? ()
#9  0x0000000000000000 in ?? ()

What's the expected result?

Exit normally.

@ilue ilue changed the title Use of pthread_t before pthread_create return is unsafe. zmq::thread_t::applySchedulingParameters() crash with musl-libc. Jun 17, 2018
@bluca
Copy link
Member

bluca commented Jun 22, 2018

Are there pre-built toolchains with musl? Simply statically linking to musl and with a glibc gcc/stdlibc++ does not reproduce the problem, and I don't have time to re-bootstrap from scratch

@ilue
Copy link
Contributor Author

ilue commented Jun 23, 2018

x86_64-unknown-linux-musl.tar.xz
Should run on any x86_64 linux machine.

@bluca
Copy link
Member

bluca commented Jun 23, 2018

I can't reproduce the problem, both the example program and all the tests run just fine. This is on Debian Sid as a host.

@ilue
Copy link
Contributor Author

ilue commented Jun 24, 2018

I got SIGSEGV while running the example program under GDB.

@bluca
Copy link
Member

bluca commented Jun 25, 2018

Ah yes I can reproduce it with gdb - strange. Feel free to send a PR to fix it.

@bluca bluca closed this as completed in e22cd67 Jun 26, 2018
bluca added a commit that referenced this issue Jun 26, 2018
@drennalls
Copy link

I just ran into this same issue with a zerorpc client under Alpine. It's pretty easy to reproduce by just repeatedly running the equivalent of that code above (but in my case via the zerorpc python wrapper). Any idea of when this fix make it into an official release ?

@bluca
Copy link
Member

bluca commented Sep 28, 2018

Soon-ish

MohammadAlTurany pushed a commit to FairRootGroup/libzmq that referenced this issue Jan 28, 2019
@Darlelet
Copy link

Darlelet commented Feb 2, 2022

Hi,

I got this trace on a buildroot linux x64 system using musl 1.2.2 toolchain (http://www.musl-libc.org/releases/musl-1.2.2.tar.gz) with zeromq 4.3.4 and 5.4.31 kernel:

$> ./testbinary
"No such process (src/thread.309)"

src/thread.c (lines 308-309)

308:   int rc = pthread_getschedparam (pthread_self (), &policy, &param);
309:    posix_assert (rc);

posix_assert is being triggered because pthread_getschedparam returns with ESRCH :
Either pthread_self() returns garbage data, or pthread_getschedparam doesn't see the thread yet. It occurs 100% of the time.

Thus the workaround implemented in b0d9a5a isn't working anymore in latest MUSL dependent systems.

My guess is that we're now facing a MUSL related bug.

Where should we inform MUSL developers about this possible race between pthread_create in main thread and pthread_getschedparam using pthread_self() in secondary thread?

@Darlelet
Copy link

Darlelet commented Feb 2, 2022

Tried using the exact same environment, but with MUSL 1.1.24 instead of 1.2.2 -> working fine, no crash and zeromq behaves correctly

With this I'm pretty confident that the regression has been introduced in 1.2.X MUSL series.

@Darlelet
Copy link

Darlelet commented Mar 1, 2022

Anyone knows how to reach out to MUSL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants