Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zmq_ctx_term throws Assertion failed: pfd.revents & POLLIN (src/signaler.cpp:226) #1307

Closed
metadings opened this issue Jan 16, 2015 · 19 comments

Comments

@metadings
Copy link
Member

When I'm terminating my context, the program throws Assertion failed: pfd.revents & POLLIN (src/signaler.cpp:226) as SIGABORT.

Thread 3 (Thread 0x7f6150b23700 (LWP 3391)):
#0  0x00007f61557f6ee9 in __libc_waitpid (pid=pid@entry=3394, stat_loc=stat_loc@entry=0x7f6150b21c0c, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00000000004b4009 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>) at mini-exceptions.c:2323
#2  <signal handler called>
#3  0x00007f6155457bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4  0x00007f615545afc8 in __GI_abort () at abort.c:89
#5  0x00007f6152061259 in zmq::zmq_abort (errmsg_=errmsg_@entry=0x7f61520c8940 "pfd.revents & POLLIN") at src/err.cpp:74
#6  0x00007f615207f18e in zmq::signaler_t::wait (this=this@entry=0x286b398, timeout_=timeout_@entry=-1) at src/signaler.cpp:226
#7  0x00007f6152064bb0 in zmq::mailbox_t::recv (this=this@entry=0x286b338, cmd_=cmd_@entry=0x7f6150b22a50, timeout_=timeout_@entry=-1) at src/mailbox.cpp:70
#8  0x00007f6152055c3c in zmq::ctx_t::terminate (this=this@entry=0x286b2a0) at src/ctx.cpp:157
#9  0x00007f6152095b38 in zmq_ctx_term (ctx_=0x286b2a0) at src/zmq.cpp:155
#10 0x00000000410d8500 in ?? ()
#11 0x00007f6144002640 in ?? ()
#12 0x00007f6150b22e00 in ?? ()
...

And also sometimes pure virtual method called, terminate called without an active exception

Thread 2 (Thread 0x7fdb11922700 (LWP 3647)):
#0  0x00007fdb1679dee9 in __libc_waitpid (pid=pid@entry=3648, stat_loc=stat_loc@entry=0x7fdb11920a0c, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00000000004b4009 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>) at mini-exceptions.c:2323
#2  <signal handler called>
#3  0x00007fdb163febb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4  0x00007fdb16401fc8 in __GI_abort () at abort.c:89
#5  0x00007fdb12d9c6b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fdb12d9a836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fdb12d9a863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fdb12d9b33f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fdb13064b92 in read (value_=0x7fdb11921860, this=0x2259f08) at src/ypipe.hpp:156
#10 zmq::mailbox_t::recv (this=this@entry=0x2259f08, cmd_=cmd_@entry=0x7fdb11921860, timeout_=timeout_@entry=-1) at src/mailbox.cpp:62
#11 0x00007fdb13055c3c in zmq::ctx_t::terminate (this=this@entry=0x2259e70) at src/ctx.cpp:157
#12 0x00007fdb13095b38 in zmq_ctx_term (ctx_=0x2259e70) at src/zmq.cpp:155
#13 0x0000000040356ca0 in ?? ()
#14 0x00007fdb000026a0 in ?? ()
#15 0x00007fdb11921c60 in ?? ()
#16 0x0000000000000000 in ?? ()

Any thoughts...?

@rodgert
Copy link
Contributor

rodgert commented Jan 16, 2015

I'm assuming this is Linux?

signaler.cpp:226 is -

zmq_assert (pfd.revents & POLLIN);

But, Linux may, in addition to the requested events, also signal POLLERR,
POLLHUP, and POLLNVAL

so I think strictly asserting on POLLIN here may be the wrong thing to do.

On Fri, Jan 16, 2015 at 4:44 AM, Uli Riehm notifications@github.com wrote:

When I'm terminating my context, the program throws Assertion failed:
pfd.revents & POLLIN (src/signaler.cpp:226) as SIGABORT.

Thread 3 (Thread 0x7f6150b23700 (LWP 3391)):
#0 0x00007f61557f6ee9 in libc_waitpid (pid=pid@entry=3394, stat_loc=stat_loc@entry=0x7f6150b21c0c, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1 0x00000000004b4009 in mono_handle_native_sigsegv (signal=, ctx=) at mini-exceptions.c:2323
#2
#3 0x00007f6155457bb9 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4 0x00007f615545afc8 in GI_abort () at abort.c:89
#5 0x00007f6152061259 in zmq::zmq_abort (errmsg
=errmsg
@entry=0x7f61520c8940 "pfd.revents & POLLIN") at src/err.cpp:74
#6 0x00007f615207f18e in zmq::signaler_t::wait (this=this@entry=0x286b398, timeout
=timeout
@entry=-1) at src/signaler.cpp:226
#7 0x00007f6152064bb0 in zmq::mailbox_t::recv (this=this@entry=0x286b338, cmd=cmd@entry=0x7f6150b22a50, timeout_=timeout_@entry=-1) at src/mailbox.cpp:70
#8 0x00007f6152055c3c in zmq::ctx_t::terminate (this=this@entry=0x286b2a0) at src/ctx.cpp:157
#9 0x00007f6152095b38 in zmq_ctx_term (ctx_=0x286b2a0) at src/zmq.cpp:155
#10 0x00000000410d8500 in ?? ()
#11 0x00007f6144002640 in ?? ()
#12 0x00007f6150b22e00 in ?? ()
#13 0x00007f6153c6a838 in ?? ()
#14 0x00007f6153c6a838 in ?? ()
#15 0x00007f6153c08f30 in ?? ()
#16 0x00007f61440025e1 in ?? ()
#17 0x00000000407112f0 in ?? ()
#18 0x00007f6150b22c00 in ?? ()
#19 0x00007f6150b22af0 in ?? ()
#20 0x0000000040711213 in ?? ()
#21 0x00007f6153c6a838 in ?? ()
#22 0x00007f6153c08d10 in ?? ()
#23 0x00007f6153c091b0 in ?? ()
#24 0x0000000040703b58 in ?? ()
#25 0x00007f6153c6a838 in ?? ()
#26 0x0000001b0000001b in ?? ()
#27 0x00007f6100000000 in ?? ()
#28 0x0000000040703952 in ?? ()
#29 0x0000000000000000 in ?? ()


Reply to this email directly or view it on GitHub
#1307.

@metadings
Copy link
Member Author

Hm okay, I'll try

@rodgert
Copy link
Contributor

rodgert commented Jan 16, 2015

I don't think there's anything to really to try here. This is a possible
bug in the signaler code (is assert() really what we want?).

On Fri, Jan 16, 2015 at 3:05 PM, Uli Riehm notifications@github.com wrote:

Hm okay, I'll try


Reply to this email directly or view it on GitHub
#1307 (comment).

@metadings
Copy link
Member Author

Well no ;-)

@metadings
Copy link
Member Author

I'm getting this error just in the Interrupt example.

If you run zmq.ctx_term, the program is (blocking) waiting for CTRL+C to terminate.

I believe that zmq_ctx_term should terminate currently running sockets, just like others in zmq_proxy, zmq_proxy_steerable and zmq_socket_monitor...

Now this is the error (when calling context.Terminate() from the Console.CancelKeyPress thread):

Terminating, you have pressed CTRL+C.
Assertion failed: pfd.revents & POLLIN (src/signaler.cpp:226)
Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) ZeroMQ.lib.zmq.zmq_ctx_term (intptr) <0xffffffff>
  at ZeroMQ.ZContext.Terminate () <0x00065>
  at ZeroMQ.Test.Program/<Interrupt>c__AnonStorey4.<>m__0 (object,System.ConsoleCancelEventArgs) <0x0002f>
  at System.Console.DoConsoleCancelEvent () <0x000e6>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

    /usr/bin/cli() [0x4accac]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fe0dac49340]
    /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7fe0da8a9cc9]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7fe0da8ad0d8]
    /home/metadings/Documents/zguide/examples/C#/bin/Debug/amd64/libzmq.so(+0x21eca) [0x7fe0d743eeca]
    /home/metadings/Documents/zguide/examples/C#/bin/Debug/amd64/libzmq.so(+0x4ad99) [0x7fe0d7467d99]
    /home/metadings/Documents/zguide/examples/C#/bin/Debug/amd64/libzmq.so(+0x2751f) [0x7fe0d744451f]
    /home/metadings/Documents/zguide/examples/C#/bin/Debug/amd64/libzmq.so(+0x10363) [0x7fe0d742d363]
    /home/metadings/Documents/zguide/examples/C#/bin/Debug/amd64/libzmq.so(zmq_ctx_term+0x54) [0x7fe0d74851e3]
    [0x40f3a500]

Debug info from gdb:

[New LWP 5716]
[New LWP 5715]
[New LWP 5712]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
  Id   Target Id         Frame 
  4    Thread 0x7fe0d7fff700 (LWP 5712) "Finalizer" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
  3    Thread 0x7fe0d9500700 (LWP 5715) "Threadpool moni" __clock_nanosleep (clock_id=1, flags=1, req=0x7fe0d94ffd20, rem=0xffffffffffffffff) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:49
  2    Thread 0x7fe0d5f00700 (LWP 5716) "Threadpool work" 0x00007fe0dac48ee9 in __libc_waitpid (pid=5717, stat_loc=0x7fe0d5efe94c, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
* 1    Thread 0x7fe0db7667c0 (LWP 5710) "cli" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

Thread 4 (Thread 0x7fe0d7fff700 (LWP 5712)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x0000000000619238 in mono_sem_wait ()
#2  0x000000000059d02d in ?? ()
#3  0x0000000000582484 in ?? ()
#4  0x000000000061e0b6 in ?? ()
#5  0x00007fe0dac41182 in start_thread (arg=0x7fe0d7fff700) at pthread_create.c:312
#6  0x00007fe0da96e00d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7fe0d9500700 (LWP 5715)):
#0  __clock_nanosleep (clock_id=1, flags=1, req=0x7fe0d94ffd20, rem=0xffffffffffffffff) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:49
#1  0x000000000060c9c0 in ?? ()
#2  0x000000000058582d in ?? ()
#3  0x0000000000582484 in ?? ()
#4  0x000000000061e0b6 in ?? ()
#5  0x00007fe0dac41182 in start_thread (arg=0x7fe0d9500700) at pthread_create.c:312
#6  0x00007fe0da96e00d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7fe0d5f00700 (LWP 5716)):
#0  0x00007fe0dac48ee9 in __libc_waitpid (pid=5717, stat_loc=0x7fe0d5efe94c, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00000000004acd39 in ?? ()
#2  <signal handler called>
#3  0x00007fe0da8a9cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4  0x00007fe0da8ad0d8 in __GI_abort () at abort.c:89
#5  0x00007fe0d743eeca in zmq::zmq_abort (errmsg_=0x7fe0d74b96e9 "pfd.revents & POLLIN") at src/err.cpp:74
#6  0x00007fe0d7467d99 in zmq::signaler_t::wait (this=0x10f94a8, timeout_=-1) at src/signaler.cpp:226
#7  0x00007fe0d744451f in zmq::mailbox_t::recv (this=0x10f9448, cmd_=0x7fe0d5eff840, timeout_=-1) at src/mailbox.cpp:70
#8  0x00007fe0d742d363 in zmq::ctx_t::terminate (this=0x10f93b0) at src/ctx.cpp:158
#9  0x00007fe0d74851e3 in zmq_ctx_term (ctx_=0x10f93b0) at src/zmq.cpp:157
#10 0x0000000040f3a500 in ?? ()
#11 0x00007fe0c00026b0 in ?? ()
#12 0x00007fe0d5effc60 in ?? ()
#13 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fe0db7667c0 (LWP 5710)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000005f840b in ?? ()
#2  0x000000000060b70b in ?? ()
#3  0x000000000060c058 in ?? ()
#4  0x0000000000583a38 in ?? ()
#5  0x0000000000583d4d in mono_thread_manage ()
#6  0x0000000000483092 in mono_main ()
#7  0x00007fe0da894ec5 in __libc_start_main (main=0x4229c0, argc=4, argv=0x7fffe92e5bf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffe92e5be8) at libc-start.c:287
#8  0x0000000000422c79 in ?? ()
#9  0x00007fffe92e5be8 in ?? ()
#10 0x000000000000001c in ?? ()
#11 0x0000000000000004 in ?? ()
#12 0x00007fffe92e7223 in ?? ()
#13 0x00007fffe92e7230 in ?? ()
#14 0x00007fffe92e7245 in ?? ()
#15 0x00007fffe92e724f in ?? ()
#16 0x0000000000000000 in ?? ()

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Aborted (core dumped)

on Windows:

Terminating, you have pressed CTRL+C.
Assertion failed: nbytes == sizeof (dummy) (..\..\..\..\src\signaler.cpp:290)

@hintjens
Copy link
Member

The deadlock is (almost always) due to one or more open sockets, or sockets
with LINGER set to its default non-zero value. I'd tried to change libzmq
to set linger to zero by default; that did not get into master. So before
terminating you must set LINGER to zero on all sockets, and then close all
sockets, and then call ctx_term.

On Sat, Jan 24, 2015 at 1:12 AM, Uli Riehm notifications@github.com wrote:

I'm getting this error just for my Interrupt example
https://github.com/metadings/zguide/blob/master/examples/C%23/interrupt.cs
.

You have to terminate the program with CTRL+C. If you run zmq.ctx_term,
the program is (blocking) waiting for CTRL+C.


Reply to this email directly or view it on GitHub
#1307 (comment).

@metadings
Copy link
Member Author

Well... It's just the zmq_ctx_term example which does assert.

Don't get me wrong, but I don't care about LINGER values and running sockets... I just want to terminate, from another thread, from the same thread; I don't need that assert ;-)

The example doesn't make sense, if I'm just waiting for CTRL+C and then go out clean. I need to run zmq.ctx_term in the middle of a Receive/Send operation, which then should return -1, errno = ETERM.

Oh I see now zmq_ctx_shutdown... is that I'm looking for? YES, it works for now...

You do now zmq_ctx_shutdown, then you go out with ETERM, being able to close the socket and zmq_ctx_terminate the ZContext... See context.Shutdown() in the Interrupt example!

However, I don't exactly know because I'm not "in the code" so... it's bad to have a sigabort.

@hintjens
Copy link
Member

Can you get a C example that provokes this?

On Sat, Jan 24, 2015 at 3:48 PM, Uli Riehm notifications@github.com wrote:

Well... It's just the zmq_ctx_term example which does assert.

I don't care about LINGER values and running sockets. I just want to
terminate, from another thread, from the same thread; I don't need that
assert ;-)

The example doesn't make sense, if I'm just waiting for ESC/CTRL+C and
then go out, without calling zmq.ctx_term


Reply to this email directly or view it on GitHub
#1307 (comment).

@metadings
Copy link
Member Author

Well I'm not a good C programmer...

// Shows how to handle Ctrl-C

#include <zmq.h>
#include <stdio.h>
#include <signal.h>

// Signal handling
//
// Call s_catch_signals() in your application at startup, and then
// exit your main loop if s_interrupted is ever 1. Works especially
// well with zmq_poll.
static int s_interrupted = 0;
static void s_signal_handler (int signal_value)
{
  s_interrupted = 1;
}

#ifndef WIN32
static void s_catch_signals (void)
{
  struct sigaction action;
  action.sa_handler = s_signal_handler;
  action.sa_flags = 0;
  sigemptyset (&action.sa_mask);
  sigaction (SIGINT, &action, NULL);
  sigaction (SIGTERM, &action, NULL);
}
#endif

int main (void)  
{
  void *context = zmq_ctx_new ();
  void *socket = zmq_socket (context, ZMQ_REP);
  zmq_bind (socket, "tcp://*:5555");

#ifndef WIN32
  s_catch_signals ();
#else
  signal (SIGINT, s_signal_handler);
  signal (SIGTERM, s_signal_handler);
#endif

  while (1) 
  {
    if (s_interrupted) {
      s_interrupted = 0;
      printf ("W: interrupt received, shutting down server...\n");
      if (-1 == zmq_ctx_shutdown(context)) {
        errno = zmq_errno();

        printf ("E: (%d) %s\n", errno, strerror(errno));
        break;
      }
    }

    char buffer [255];
    int rc = zmq_recv (socket, buffer, 255, ZMQ_DONTWAIT);
    if (rc == -1) {
      errno = zmq_errno();

      if (errno == EAGAIN) {
        continue;
      }
      if (errno == ETERM) {
        printf ("I: Terminated!\n");
        break;
      }

      printf ("E: (%d) %s\n", errno, strerror(errno));
      break;
    }

    rc = zmq_send(socket, "Hello", 5, 0);
    if (rc == -1) {
      errno = zmq_errno();

      if (errno == ETERM) {
        printf ("I: Terminated!\n");
        break;
      }

      printf ("E: (%i) %s\n", errno, strerror(errno));
      break;
    }
  }
  zmq_close (socket);
  zmq_ctx_destroy (context);
  return 0;
}

Ah, and you have to replace zmq_ctx_shutdown with zmq_ctx_term to (possibly) get the abort... I'm removing this from my disk. Please try the C# edition... (Install mono-complete, monodevelop, git clone --depth=2 https://github.com/metadings/zguide, open the project, setup the reference (add ZeroMQ.dll), build the project and then run ./bin/Debug/ZGuideExamples.exe Interrupt.)

@rodgert
Copy link
Contributor

rodgert commented Jan 26, 2015

By "works on Windows" do you mean exhibits the assert?

On Sun, Jan 25, 2015 at 2:16 PM, Uli Riehm notifications@github.com wrote:

Well I'm not a good C programmer... What I did now works on Windows, not
on Linux.

// Shows how to handle Ctrl-C

#include <zmq.h>
#include <stdio.h>
#include <signal.h>

// Signal handling
//
// Call s_catch_signals() in your application at startup, and then
// exit your main loop if s_interrupted is ever 1. Works especially
// well with zmq_poll.
static int s_interrupted = 0;
static void s_signal_handler (int signal_value)
{
s_interrupted = 1;
}

#ifndef WIN32
static void s_catch_signals (void)
{
struct sigaction action;
action.sa_handler = s_signal_handler;
action.sa_flags = 0;
sigemptyset (&action.sa_mask);
sigaction (SIGINT, &action, NULL);
sigaction (SIGTERM, &action, NULL);
}
#endif

int main (void)
{
void context = zmq_ctx_new ();
void *socket = zmq_socket (context, ZMQ_REP);
zmq_bind (socket, "tcp://
:5555");

#ifndef WIN32
s_catch_signals ();
#else
signal (SIGINT, s_signal_handler);
signal (SIGTERM, s_signal_handler);
#endif

while (1)
{
if (s_interrupted) {
s_interrupted = 0;
printf ("W: interrupt received, shutting down server...\n");
if (-1 == zmq_ctx_shutdown(context)) {
errno = zmq_errno();

    printf ("E: (%d) %s\n", errno, strerror(errno));
    break;
  }
}

char buffer [255];
int rc = zmq_recv (socket, buffer, 255, ZMQ_DONTWAIT);
if (rc == -1) {
  errno = zmq_errno();

  if (errno == EAGAIN) {
    continue;
  }
  if (errno == ETERM) {
    printf ("I: Terminated!\n");
    break;
  }

  printf ("E: (%d) %s\n", errno, strerror(errno));
  break;
}

rc = zmq_send(socket, "Hello", 5, 0);
if (rc == -1) {
  errno = zmq_errno();

  if (errno == ETERM) {
    printf ("I: Terminated!\n");
    break;
  }

  printf ("E: (%i) %s\n", errno, strerror(errno));
  break;
}

}
zmq_close (socket);
zmq_ctx_destroy (context);
return 0;
}


Reply to this email directly or view it on GitHub
#1307 (comment).

@metadings
Copy link
Member Author

No, I didn't get the signalling correct, on Windows it did work (with errno = zmq_errno()), on Linux I didn't get it running.

Please take the hwserver example, make it DONTWAIT and try to catch the SIGINT. When getting a SIGINT, usually you would do ctx_shutdown, however the assert is coming if you do ctx_term instead.

@metadings
Copy link
Member Author

I believe you shouldn't zmq_ctx_term, if there was someone zmq_ctx_shutdown...

I am now setting _contextPtr to IntPtr.Zero in ZContext.Shutdown, to avoid the zmq_ctx_term.

@mb78
Copy link

mb78 commented Jul 16, 2015

Hi, I got the same crash in signaler.cpp:226 with 4.1.0, this patch seems fixing it

diff --git a/src/signaler.cpp b/src/signaler.cpp
index 25667bf..ba5b288 100644
--- a/src/signaler.cpp
+++ b/src/signaler.cpp
@@ -223,6 +223,11 @@ int zmq::signaler_t::wait (int timeout_)
     }
 #endif
     zmq_assert (rc == 1);
+       if (pfd.revents & POLLNVAL)
+       {
+               errno=EINTR;
+               return -1;
+       }
     zmq_assert (pfd.revents & POLLIN);
     return 0;
diff --git a/src/mailbox.cpp b/src/mailbox.cpp
index bd140a4..da50cf0 100644
--- a/src/mailbox.cpp
+++ b/src/mailbox.cpp
@@ -67,14 +67,18 @@ int zmq::mailbox_t::recv (command_t *cmd_, int timeout_)
     }

     //  Wait for signal from the command sender.
-    const int rc = signaler.wait (timeout_);
+    int rc = signaler.wait (timeout_);
     if (rc == -1) {
         errno_assert (errno == EAGAIN || errno == EBADF);
         return -1;
     }

     //  Receive the signal.
-    signaler.recv ();
+    rc=signaler.recv ();
+    if (rc == -1) {
+        errno_assert (errno == EINTR);
+        return -1;
+    }

     //  Switch into active state.
     active = true;
diff --git a/src/signaler.cpp b/src/signaler.cpp
index ba5b288..a9609df 100644
--- a/src/signaler.cpp
+++ b/src/signaler.cpp
@@ -265,12 +265,17 @@ int zmq::signaler_t::wait (int timeout_)
 #endif
 }

-void zmq::signaler_t::recv ()
+int zmq::signaler_t::recv ()
 {
     //  Attempt to read a signal.
 #if defined ZMQ_HAVE_EVENTFD
     uint64_t dummy;
     ssize_t sz = read (r, &dummy, sizeof (dummy));
+    if (sz==0 || (sz==-1 && errno==EINVAL))
+    {
+        errno=EINTR;
+        return -1;
+    }
     errno_assert (sz == sizeof (dummy));

     //  If we accidentally grabbed the next signal along with the current
@@ -279,7 +284,7 @@ void zmq::signaler_t::recv ()
         const uint64_t inc = 1;
         ssize_t sz2 = write (w, &inc, sizeof (inc));
         errno_assert (sz2 == sizeof (inc));
-        return;
+        return 0;
     }

     zmq_assert (dummy == 1);
@@ -295,6 +300,7 @@ void zmq::signaler_t::recv ()
     zmq_assert (nbytes == sizeof (dummy));
     zmq_assert (dummy == 0);
 #endif
+    return 0;
 }

 #ifdef HAVE_FORK
diff --git a/src/signaler.hpp b/src/signaler.hpp
index b66f0ae..54a271d 100644
--- a/src/signaler.hpp
+++ b/src/signaler.hpp
@@ -44,7 +44,7 @@ namespace zmq
         fd_t get_fd () const;
         void send ();
         int wait (int timeout_);
-        void recv ();
+        int recv ();

 #ifdef HAVE_FORK
         // close the file descriptors in a forked child process so that they
--
1.8.4.GIT

Race condition in ctx destroy (presumably also in *_term) when sending to socket from this context may be provoked with utilising CPU on max. I could reproduce it several times.

@hitstergtd
Copy link
Member

Was this fixed in the end?

@sourcedelica
Copy link

I ran into this in our codebase where it was closing sockets where it should have been terminating the context instead.

@stale
Copy link

stale bot commented Nov 4, 2018

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@stale stale bot added the stale label Nov 4, 2018
@stale stale bot closed this as completed Dec 31, 2018
@themightyoarfish
Copy link

was autoclosed, but still occurs?

@Kaju-Bubanja
Copy link

Can this be reopened? It's still occurring.

@Falital
Copy link

Falital commented Jun 11, 2024

As the documentation states ZMQ_LINGER should be set to an none infinate value so this does not happen it is up to the user of the libary to decide which. I personaly set it to the half of ZMQ_RCVTIMEO see the warning in https://libzmq.readthedocs.io/en/zeromq4-x/zmq_ctx_term.html for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants