-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TCP fallback with multiple nameservers [Bug #8285] #50
Conversation
Under the following conditions the exception `Resolv::DNS::Requester::RequestError: host/port don't match` is raised: - Multiple nameservers are configured for Resolv::DNS - A nameserver falls back from UDP to TCP - TCP request hits Resolv::DNS timeout - Resolv::DNS retries the next nameserver More details here https://bugs.ruby-lang.org/issues/8285 Co-authored-by: Julian Mehnle <julian@mehnle.net>
6686517
to
7d524df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the TCP requester is connected to the nameserver's address and port and cannot be used to send requests to the other nameservers. This patch solves the problem by making a TCP requester for each nameserver and memoizing it. I think this is the correct approach. (I just noticed that the memoization key should include the nameserver's port, too).
This patch also tracks the necessity of TCP fallback for each candidate FQDN and always tries UDP before TCP. Unlike 11 years ago when the original patch was written and we had many unreliable TCP nameservers and middleboxes, the use of TCP is recommended today when it is possible without additional handshake overhead. I think it's better to follow this guideline:
Stub resolvers and recursive resolvers MAY elect to send either TCP or UDP queries depending on local operational reasons.
TCP MAY be used before sending any UDP queries.
If the resolver already has an open TCP connection to the server, it SHOULD reuse this connection.
Another drawback of this patch is that it makes a UDP requester for each nameserver. A UDP requester is unconnected when multiple nameservers are configured (Requester::UnconnectedUDP
), so we only need a single instance of it. The patched version creates up to (2 × #nameservers) UDP sockets when both IPv4 and IPv6 resolvers are configured. I think this needs fixing before merging the patch.
My proposal: https://github.com/ruby/resolv/compare/master...hanazuki:ruby-resolv:reuse-tcp-conn?expand=1
[RFC7766] Section 5 recommends stub resolvers to reuse open TCP connections to a nameserver. [RFC7766]: https://datatracker.ietf.org/doc/html/rfc7766
@hanazuki These are valid points. Thank you for the detailed explanation. I've added your commit to this pull request. Thanks for looking into it. |
@hanazuki any chance this fix can be merged? |
test_dns: Fix failure on Windows 1. Switch to #with_udp_and_tcp helper method for EACCES retries on Windows; the given UDP socket is unnecessary though. 2. Using 127.0.0.1 should be fine, as it must give different host:port pair. 3. On Windows, 5 retries of bind(2) appears still flaky, doubling it for sure. follow-up: #50 follow-up: 6252914
(ruby/resolv#58) test_dns: Fix failure on Windows 1. Switch to #with_udp_and_tcp helper method for EACCES retries on Windows; the given UDP socket is unnecessary though. 2. Using 127.0.0.1 should be fine, as it must give different host:port pair. 3. On Windows, 5 retries of bind(2) appears still flaky, doubling it for sure. follow-up: ruby/resolv#50 follow-up: ruby/resolv@6252914 ruby/resolv@0a0d57e369
The listening TCP socket is closed by `with_udp_and_tcp` helper, but the connected socket is leaking. ``` Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ruby 3248055 chkbuild 12u IPv4 112546322 0t0 TCP localhost:50888->localhost:40112 (CLOSE_WAIT) ``` Fixup: ruby#50
The listening TCP socket is closed by `with_udp_and_tcp` helper, but the connected socket is leaking. ``` Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ruby 3248055 chkbuild 12u IPv4 112546322 0t0 TCP localhost:50888->localhost:40112 (CLOSE_WAIT) ``` Fixup: ruby#50
The listening TCP socket is closed by `with_udp_and_tcp` helper, but the connected socket is leaking. ``` Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ruby 3248055 chkbuild 12u IPv4 112546322 0t0 TCP localhost:50888->localhost:40112 (CLOSE_WAIT) ``` For the purpose of the test case to simulate a timeout over TCP transport, we have to delay closing this socket until the end the test case. Fixup: ruby#50
The listening TCP socket is closed by `with_udp_and_tcp` helper, but the connected socket is leaking. ``` Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ruby 3248055 chkbuild 12u IPv4 112546322 0t0 TCP localhost:50888->localhost:40112 (CLOSE_WAIT) ``` For the purpose of the test case to simulate a timeout over TCP transport, we have to delay closing this socket until the end the test case. Fixup: ruby/resolv#50 ruby/resolv@236c38bdb1
Originally reported here https://bugs.ruby-lang.org/issues/8285, but the patch has never been finalized.
We stumbled on the same issue under the following conditions:
Resolv::DNS
timeoutResolv::DNS
retries the next nameserverExpected result:
Resolv::DNS
successfully falls back to the next nameserver, using TCP request.Actual result:
Resolv::DNS::Requester::RequestError: host/port don't match
is raised.This pull request: