-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid asserting on getifaddrs failure #2051
Comments
Sounds reasonable, as long as it doesn't block forever. Would you send a PR to fix it (master first, and then backport to 4.1 if necessary)? |
OK I will put something together. Thanks for the follow up. |
Thanks! |
garlick
added a commit
to garlick/libzmq
that referenced
this issue
Jul 20, 2016
getifaddrs() can fail transiently with ECONNREFUSED on Linux. This has been observed with Linux 3.10 when multiple processes call zmq::tcp_address_t::resolve_nic_name() simultaneously. Before asserting in this case, make 10 attempts, with exponential backoff, given by (1 msec * 2^i), where i is the attempt number. Fixes zeromq#2051
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We've noticed the following assertion in zeromq 4.1.4 on a rhel 7.2 system (kernel 3.10):
Here's a backtrace from a core file
which I believe is this assertion in src/tcp_address.cpp (reference is to master not 4.1.4)
Apparently getifaddrs can fail. Since it communicates with the kernel using the netlink socket, I suppose it might run out of something when abused. Although I wouldn't say we're abusing it - merely starting a dozen or so copies of the same zeromq based program at the same time.
Perhaps a backoff-retry would be appropriate here instead of an assertion?
The text was updated successfully, but these errors were encountered: