-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_immediate_3/test_reconnect_inproc do not terminate with POLL poller under Windows #3107
Comments
The main thread is in:
The I/O thread is in
|
Apparently this has nothing to do with ZMQ_IMMEDIATE. It also happens when removing the ZMQ_IMMEDIATE option, but trying to connect another ZMQ_DEALER peer after closing the first. |
This is demonstrated by https://github.com/sigiesec/libzmq/tree/fix-win-poll-deadlock (CI job https://ci.appveyor.com/project/sigiesec/libzmq/build/build-547/job/ha0n591d1w67dtlg) I have no idea at the moment how this affected by the polling implementation. |
To avoid I am investigating non-sense, could someone please check if the test case in https://github.com/sigiesec/libzmq/blob/aa3ed7e9da24ff214919b6a58465ec26265ecd25/tests/test_spec_dealer.cpp#L202 is correct and is expected to work? It happens to work on all builds other than Windows poll, but this might be coincidence. |
I instrumented libzmq to output all commands processed by a socket (and some debug output from the test case). With POLLER=select, I get:
With POLLER=poll, it blocks after:
|
Test case looks valid to me. What happens if you call zmq_unbind before closing the socket? What if instead of binding, it connects? |
@bluca Did not try this yet, but will later. But I found another difference, which sounds relevant. In the good run with POLLER=select, the following call stack is reached:
tcp_connecter_t::out_event() Line 147 is never reached with POLLER=poll It is this branch: Line 147 in cbd52fe
The pending connect had failed with error 10061 (WSAECONNREFUSED) before. |
Ok, some more research showed that this is far more severe than I thought. If the information here is correct and I understand it correctly, this is impossible to fix under Windows, as WSAPoll simply does not report connection failures:
Can someone please double-check this? While this seems to affect only a few existing test cases, it is non-foreseeable for a user when it might happen. It is probably only relevant for the I/O thread (i.e. poll.cpp), since a socket_poller/zmq_poller_poll would never poll for the result of a connect attempt. The conclusion would need to be to change the build scripts to disallow the use of poll/WSApoll on Windows completely. select would then be the only poller_t implementation on Windows for now. If my assumption above is correct, we could still support poll for zmq_poll/zmq_poller_*/socket_poller_t. |
E.g. python rejected to support poll on Windows for similar reasons: https://bugs.python.org/issue16507 |
Yikes! I have no idea about WSAPoll but if that's the case, then I agree that we should remove the option for the I/O thread. |
Solution: added test case, reproduces the likely cause for zeromq#3107
Solution: added test case, reproduces the likely cause for zeromq#3107
Solution: added test case, reproduces the likely cause for zeromq#3107
Please use this template for reporting suspected bugs or requests for help.
Issue description
test_immediate test case test_immediate_3 does not terminate with POLL poller under Windows
Environment
Minimal test code / Steps to reproduce the issue
What's the actual result? (include assertion message & call stack if applicable)
Blocks forever.
What's the expected result?
Terminates without error.
The text was updated successfully, but these errors were encountered: