-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded swarm DNS does not fail over to secondary properly on RHEL7 #2663
Comments
We did some more digging and it appears that the In container this /etc/resolv.conf doesn't work:
This /etc/resolv.conf works:
Looks like this isn't the first time RHEL has had issues with rotate option: https://bugzilla.redhat.com/show_bug.cgi?id=841787 so it looks like maybe there is a bug in RHEL7 in Docker when Edit: The reason the second one was working is the syntax for /etc/resolv.conf was wrong for timeout (timeout=2, should be timeout:2) so it was reverting to default timeout of 5 |
Reproduced using a clean image of RHEL7 and the key between things working with primary dropping traffic was timeout.
Works:
So not sure what kind of weird race condition is happening |
Code from here is mostly moved to moby/moby (look #2665 ) and that would be probably better place to report this as well. However what is default |
Thank you @olljanat , will crosspost this issue there. default timeout is set to 5 seconds |
OS: RHEL7
Docker Version: 20.10.17
Problem: When primary DNS server is down, embedded DNS server returns timeout even though secondary is available
Reproduction (on RHEL7 host- I got trial sub to get RHEL 7 https://access.redhat.com/downloads/content/69/ver=/rhel---7/7.9/x86_64/packages):
This works as expected. However, if we simulate failure of primary DNS with iptables the results are not as we would expect
Drop traffic to primary DNS (eg 10.10.10.10)
Re-run
java DNSLookup
in container and we intermittently but the majority of the time getThe debug logs show that we get an io timeout to the primary (replaced with 10.10.10.10), it tries and succeeds to get a result from secondary (replaced with 10.10.10.20) but then continues to try both the primary and the secondary with search domain appended, which means that the successful request was never returned to the underlying container
So when it got a valid return from secondary DNS (lines 8 and 9), it should have stopped and things would have worked
We know that replacing 127.0.0.11 (docker embdedded dns) with the nameservers from host /etc/resolv.conf works but ideally we would like to find a way forward that allows us to still use docker embdedded dns
Edit: It does work from time to time, this is result of working scenario:
The text was updated successfully, but these errors were encountered: