-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang or assert during zmq_ctx_term after zmq_socket_monitor was used #1279
Comments
Hello, Can you try with the latest version ? |
Same with ZMQ from master (4.2?). |
This was discovered in a test case for https://github.com/zeromq/azmq It works fine on Linux and OSX, but fails as Andrey describes on Windows. On Wednesday, December 17, 2014, KAPP Arnaud notifications@github.com
|
Looks like the problem is tightly tied to order of sockets closing. It must be either:
|
I'm able to use that test program to trigger a hang on OS X. The stack trace is:
The OS X version I'm running on is:
Using zmq 4.0.5 |
@somdoron - thoughts? I know it is quite old, but I have managed to reproduce this issue on OS X. |
In my case, it is a reproducible hang, every time, not every 1 in 10 executions. Every execution. |
After further investigation - it seems that if the second call to |
I think the problem is that the internal monitor socket is access from multiple threads. the tcp_listener_t.close is accessing the monitor socket while from the background IO thread and the closing the monitor socket is happening from the user thread. This is internal libzmq problem which will not be easy to solve. I suggest to stop the monitoring before disposing any socket with zmq_socket_monitor called with NULL at address. It might solve the problem but I'm not sure. |
and some more thoughts, the monitored socket must be closed before the monitor socket. because when sending the events the operation is blocking this is way it is some time blocks (no socket on the other side). Also we can stop the monitoring with calling zmq_socket_monitor with NULL as address, then close the monitor socket and then the socket (the order of the last two is not important if you stop the monitoring). Last option is to fix the code to non blocking (just adding the flag), here: https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L1732 |
@somdoron As to your response on the internal sockets, that is an interesting thought and would require further investigation. I think it would be good to commit this test case into the repository, with the hope that we can fix it in the near future. |
I will post a DTrace for this hang, hopefully it can shed some more light. |
I bumped into the same problem (assert or hang at context termination), but in a slightly different scenario: I have a single threaded C++ (zmq.hpp) message broker using monitor for diagnostic logs. In unit tests testClients use their own zmq context, and at test termination the destruction of the production code's context asserts or hangs. Minimal code example that demonstrates the effect 8/10 times (very similar to monitor example):
Here's the assert I get:
Here's the backtrace in case of a hang:
If I modify the test program to use a single context (as the monitor example on the manpage), it runs without any issue. Platform: Ubuntu 16.04 64bit I also checked the code in the first comment (it closes the client monitor twice!), and it hangs 10/10. If I modify the close order of the first 2 to: server_mon, client_mon; or client_mon, server_mon the hang disappears and the app properly exist after 1sec. |
As a workaround I'm using the following destruction order in the code:
This seems to be stable: no assert or hang for a few hundred test iterations. |
Is this still an issue? I am seeing same issue when I use ZmqMonitor. |
I think there was a fix recently, but I'm writing from the phone so can't look it up, are you building from the libzmq master branch? |
I re-ran the example code I previously attached, and the issue is not reproducible any more. The test program always exits correctly. The issue it triggered is fixed. Env: Ubuntu 16.04 64bit, libzmq 4.2.2 stable release |
Great, thanks for confirming, closing now |
Thanks for prompt reply. |
@hoditohod - if you have time, any chance you could please turn that repro code into a unit test? Would be very handy to avoid regressions in the future |
If monitor is used and sockets are closed in specific order, ZMQ hangs or fails because of assertion during
zmq_ctx_term
. The bug appears rearely, like 1 in 10 executions. Please see more details in code comments.Code demonstrating the problem:
Platform: Windows 7 64bit (real) and Windows Server 2008 R2 64bit (VM)
Compiler: MS Visual Studio 2013
ZMQ: 4.0.5, build from sources with the same compiler
Code build in Release-x86 configuration.
Assertion:
mailbox.cpp
line 82 (functionzmq::mailbox_t::recv
)Call stack:
Can not reproduce the issue on OSX platform, looks like it's Windows specific.
UPD: information about assert and call stack added.
The text was updated successfully, but these errors were encountered: