Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TCP fallback with multiple nameservers [Bug #8285] #50

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

opti
Copy link
Contributor

@opti opti commented Apr 24, 2024

Originally reported here https://bugs.ruby-lang.org/issues/8285, but the patch has never been finalized.

We stumbled on the same issue under the following conditions:

  • Multiple nameservers are configured for Resolv::DNS
  • A nameserver falls back from UDP to TCP requester due to the message size
  • TCP request hits Resolv::DNS timeout
  • Resolv::DNS retries the next nameserver

Expected result:

Resolv::DNS successfully falls back to the next nameserver, using TCP request.

Actual result:

Resolv::DNS::Requester::RequestError: host/port don't match is raised.

This pull request:

@opti opti changed the title Fix TCP fallback with multiple nameservers [Bug 8285] Fix TCP fallback with multiple nameservers [Bug #8285] Apr 24, 2024
@opti
Copy link
Contributor Author

opti commented May 3, 2024

@sorah, @nobu any chance this can be reviewed?

Under the following conditions the exception
`Resolv::DNS::Requester::RequestError: host/port don't match` is raised:
- Multiple nameservers are configured for Resolv::DNS
- A nameserver falls back from UDP to TCP
- TCP request hits Resolv::DNS timeout
- Resolv::DNS retries the next nameserver

More details here https://bugs.ruby-lang.org/issues/8285

Co-authored-by: Julian Mehnle <julian@mehnle.net>
@opti opti force-pushed the bug-8285-tcp-fallback branch from 6686517 to 7d524df Compare May 29, 2024 20:20
Copy link
Contributor

@hanazuki hanazuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the TCP requester is connected to the nameserver's address and port and cannot be used to send requests to the other nameservers. This patch solves the problem by making a TCP requester for each nameserver and memoizing it. I think this is the correct approach. (I just noticed that the memoization key should include the nameserver's port, too).

This patch also tracks the necessity of TCP fallback for each candidate FQDN and always tries UDP before TCP. Unlike 11 years ago when the original patch was written and we had many unreliable TCP nameservers and middleboxes, the use of TCP is recommended today when it is possible without additional handshake overhead. I think it's better to follow this guideline:

RFC 7766 §5

Stub resolvers and recursive resolvers MAY elect to send either TCP or UDP queries depending on local operational reasons.
TCP MAY be used before sending any UDP queries.
If the resolver already has an open TCP connection to the server, it SHOULD reuse this connection.

Another drawback of this patch is that it makes a UDP requester for each nameserver. A UDP requester is unconnected when multiple nameservers are configured (Requester::UnconnectedUDP), so we only need a single instance of it. The patched version creates up to (2 × #nameservers) UDP sockets when both IPv4 and IPv6 resolvers are configured. I think this needs fixing before merging the patch.

My proposal: https://github.com/ruby/resolv/compare/master...hanazuki:ruby-resolv:reuse-tcp-conn?expand=1

[RFC7766] Section 5 recommends stub resolvers to reuse open TCP
connections to a nameserver.

[RFC7766]: https://datatracker.ietf.org/doc/html/rfc7766
@opti
Copy link
Contributor Author

opti commented Jun 3, 2024

@hanazuki These are valid points. Thank you for the detailed explanation. I've added your commit to this pull request. Thanks for looking into it.

@opti opti requested a review from hanazuki June 4, 2024 16:29
@opti
Copy link
Contributor Author

opti commented Jul 10, 2024

@hanazuki any chance this fix can be merged?

@hanazuki
Copy link
Contributor

+1 to merge. I'm not the maintainer and cannot merge or approve PRs.

What do you think, @sorah @nobu?

@opti
Copy link
Contributor Author

opti commented Aug 13, 2024

Any thoughts ☝️ @sorah, @nobu, @hsbt?

@sorah sorah merged commit 6252914 into ruby:master Aug 13, 2024
sorah added a commit that referenced this pull request Aug 13, 2024
sorah added a commit that referenced this pull request Aug 13, 2024
this should be fine, as it must give different host:port pair.

- follow-up: #50
- follow-up: 6252914
sorah added a commit that referenced this pull request Aug 13, 2024
test_dns: Fix failure on Windows

1. Switch to #with_udp_and_tcp helper method for EACCES retries on Windows;
   the given UDP socket is unnecessary though.
2. Using 127.0.0.1 should be fine, as it must give different host:port pair.
3. On Windows, 5 retries of bind(2) appears still flaky, doubling it for sure.

follow-up: #50
follow-up: 6252914
matzbot pushed a commit to ruby/ruby that referenced this pull request Aug 13, 2024
(ruby/resolv#58)

test_dns: Fix failure on Windows

1. Switch to #with_udp_and_tcp helper method for EACCES retries on Windows;
   the given UDP socket is unnecessary though.
2. Using 127.0.0.1 should be fine, as it must give different host:port pair.
3. On Windows, 5 retries of bind(2) appears still flaky, doubling it for sure.

follow-up: ruby/resolv#50
follow-up: ruby/resolv@6252914

ruby/resolv@0a0d57e369
hanazuki added a commit to hanazuki/ruby-resolv that referenced this pull request Aug 31, 2024
The listening TCP socket is closed by `with_udp_and_tcp` helper, but
the connected socket is leaking.

```
Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888>
COMMAND     PID     USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ruby    3248055 chkbuild   12u  IPv4 112546322      0t0  TCP localhost:50888->localhost:40112 (CLOSE_WAIT)
```

Fixup: ruby#50
hanazuki added a commit to hanazuki/ruby-resolv that referenced this pull request Aug 31, 2024
The listening TCP socket is closed by `with_udp_and_tcp` helper, but
the connected socket is leaking.

```
Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888>
COMMAND     PID     USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ruby    3248055 chkbuild   12u  IPv4 112546322      0t0  TCP localhost:50888->localhost:40112 (CLOSE_WAIT)
```

Fixup: ruby#50
@hanazuki hanazuki mentioned this pull request Aug 31, 2024
hanazuki added a commit to hanazuki/ruby-resolv that referenced this pull request Sep 1, 2024
The listening TCP socket is closed by `with_udp_and_tcp` helper, but
the connected socket is leaking.

```
Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888>
COMMAND     PID     USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ruby    3248055 chkbuild   12u  IPv4 112546322      0t0  TCP localhost:50888->localhost:40112 (CLOSE_WAIT)
```

For the purpose of the test case to simulate a timeout over TCP
transport, we have to delay closing this socket until the end the test
case.

Fixup: ruby#50
matzbot pushed a commit to ruby/ruby that referenced this pull request Sep 10, 2024
The listening TCP socket is closed by `with_udp_and_tcp` helper, but
the connected socket is leaking.

```
Leaked file descriptor: TestResolvDNS#test_multiple_servers_with_timeout_and_truncated_tcp_fallback: 12 : #<TCPSocket:fd 12, AF_INET, 127.0.0.1, 50888>
COMMAND     PID     USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
ruby    3248055 chkbuild   12u  IPv4 112546322      0t0  TCP localhost:50888->localhost:40112 (CLOSE_WAIT)
```

For the purpose of the test case to simulate a timeout over TCP
transport, we have to delay closing this socket until the end the test
case.

Fixup: ruby/resolv#50

ruby/resolv@236c38bdb1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants