-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: UDP engine aborts on networking-related errors from socket syscalls #2862
Comments
I guess the UDP engine needs the same hardening the TCP one got a few months ago, against network errors. That's a setsockopt failing. PRs welcome. |
I am not much into C++. :( Any condition I could check and break the loop? |
Hi, I had to add
so that radio socket finds a way for multicast traffic. Now core dump is not happening. :) Thanks.. |
Great, happy you found a workaround and thanks for sharing it. I'll reopen and retitle the issue, as the UDP implementation should be hardened anyway before it can be declared stable. |
I am able to reproduce the crash with v4.2.3 also. But the scenario is different.
And root cause for crash is |
I'd like to look into that but I'm not sure how to report the error to the caller given that the out_event method returns What's the correct way of dealing with this error? I think returning an error in zmq_send would make the most sense but given the threading going on it's not obvious to me how that would be done. Maybe the error should be saved and returned on subsequent calls? |
With TCP, on recoverable/temporary errors the I/O thread engine simply tries again later. Can the UDP engine do that too? |
I'd have to look into that. In the case of UDP I wonder if it makes a lot of sense though, given the best effort nature of the protocol (especially in multicast). After all even if the kernel manages to send the packet you never have any guarantee that it'll ever reach its destination. What happens if the messages pile up with TCP, I assume eventually they're simply dropped? In my case the error returned by the |
With TCP I think the messages will fill the queue, and what happens depends on the HWM settings at that point |
Now that I think about it even the calls to bind() and other syscalls in |
The way to report status on the handshake and related statuses is via socket monitor events, if they happen in the I/O thread |
Ah yeah that would work, do you think it could be used to handle UDP send errors as well or is inappropriate? |
IMHO that would be way too much traffic, and as you said UDP is unreliable by nature |
We upgraded the package and still I can see the crash when zmq_send fails. :( |
Yeah I hit that again. For the moment I still have an ugly hack were I comment the assert in zmq::udp_engine_t::out_event to ignore the failure. Obviously it's not great... I'd be interested in implementing a cleaner solution but I'm still unsure what to do. I tried taking inspiration from the TCP code but (as mentioned in previous comments) I don't really think it makes sense to retry when sendto fails. That being said, completely ignoring the error and not reporting it to the sender also seems like a poor idea. |
…eromq#2862 (remove unused function)
…eromq#2862 (revert changes in error list in zmq::assert_success_or_recoverable)
Description
I have a dish socket bind to a multicast address. And dish socket receiving messages in a while loop.
It works fine when there is an Ethernet interface is up.
I did
sudo ifconfig enp7s0 0.0.0.0
and observed the following:I am using this dish socket in an application in which the ip address of the interface often becomes
0.0.0.0
.Is there any way I could exit the while loop safely without core dumping the entire application?
Environment
The text was updated successfully, but these errors were encountered: