-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salt-minion getaddrinfo in dns_check() never gets updated nameservers because of glibc caching #21397
Comments
@tjstansell, thanks for the info. |
I have tested the following patch, which does seem to work on my CentOS 7 test box, but I'm unsure if this would be sufficient for all other platforms. I wrapped it in a try/except block so theoretically it should work everywhere. If folks think this is good, I can submit an actual PR. Since the only thing that uses dns_check() is resolve_dns() in minion.py when the minion is trying to resolve the master's IP, it seems fine to simply call res_init() every time (assuming it's available). diff --git a/salt/utils/__init__.py b/salt/utils/__init__.py
index 3b89774..c646200 100644
--- a/salt/utils/__init__.py
+++ b/salt/utils/__init__.py
@@ -96,6 +96,14 @@ try:
except ImportError:
HAS_SETPROCTITLE = False
+try:
+ import ctypes
+ libc = ctypes.cdll.LoadLibrary('libc.so.6')
+ res_init = libc.__res_init
+ HAS_RESINIT = True
+except:
+ HAS_RESINIT = False
+
# Import salt libs
import salt._compat
import salt.exitcodes
@@ -518,6 +526,9 @@ def dns_check(addr, safe=False, ipv6=False):
'''
error = False
try:
+ # issue #21397: force glibc to re-read resolv.conf
+ if HAS_RESINIT:
+ res_init()
hostnames = socket.getaddrinfo(
addr, None, socket.AF_UNSPEC, socket.SOCK_STREAM
) |
@tjstansell, this is an interesting approach, and may be similar to an eventual solution. Would it be better to locally read resolv.conf? I don't feel qualified to make a decision about this, though perhaps @cachedout or @whiteinge have better ideas. |
If you plan to continue using |
@tjstansell, thanks for bringing your research and experience to this issue. |
What do you think about this, assuming there's some way to locate the name of the system library? whether or not that needs to be hand crafted into salt itself may not be an issue. diff --git a/salt/utils/__init__.py b/salt/utils/__init__.py
index 0cd14e9..da07105 100644
--- a/salt/utils/__init__.py
+++ b/salt/utils/__init__.py
@@ -541,6 +541,20 @@ def ip_bracket(addr):
return addr
+def reload_dns():
+ '''
+ cause the system to reload DNS info
+ '''
+ libc_name = figure_out_libc_name()
+ try:
+ import ctypes
+ libc = ctypes.cdll.LoadLibrary(libc_name)
+ libc.__res_init()
+ return True
+ except ImportError:
+ return False
+
+
def dns_check(addr, safe=False, ipv6=False):
'''
Return the ip resolved by dns, but do not exit on failure, only raise an
@@ -548,6 +562,7 @@ def dns_check(addr, safe=False, ipv6=False):
'''
error = False
try:
+ reload_dns()
hostnames = socket.getaddrinfo(
addr, None, socket.AF_UNSPEC, socket.SOCK_STREAM
) |
In that case, it would try to import ctypes and load the library every time it calls reload_dns(), right? Wouldn't it be better to simply do that once at startup rather than every time? That said, I'm not sure how to determine the correct library name. The example shown on https://docs.python.org/2/library/ctypes.html uses libc.so.6, which is why I chose it. It will either work or it won't. Additionally, in your example, I don't think it should be |
That's true, but we also try to keep |
So after more searching, I found we can use the following to find the correct library name:
|
Of course, that makes sense, thanks @tjstansell. I suggest you submit a pull request with the changes you've proposed. |
fix #21397 - force glibc to re-read resolv.conf
There are many places in Salt where socket.getaddrinfo is used. All these places need to call res_init() for this fix to work? Or can we have all these places call saltutil.dns_check? Currently I am running into this same issue with network.connect module. During a recent highstate, /etc/resolv.conf was changed. Now network.connect can not resolve some local hostnames. A new nameserver was added to /etc/resolv.conf so this doesn't work until I restart the minion. cloudyhead-2:salt rvora$ grep -r getaddrinfo .
./grains/core.py: info = socket.getaddrinfo(hostname()['fqdn'], None, socket.AF_INET)
./grains/core.py: info = socket.getaddrinfo(hostname()['fqdn'], None, socket.AF_INET6)
./modules/dnsutil.py: addresses = [sock[4][0] for sock in socket.getaddrinfo(host, None, socket.AF_INET, 0, socket.SOCK_RAW)]
./modules/dnsutil.py: addresses = [sock[4][0] for sock in socket.getaddrinfo(host, None, socket.AF_INET6, 0, socket.SOCK_RAW)]
./modules/network.py: _address) = socket.getaddrinfo(address, port, __family, 0, __proto)[0]
./modules/win_network.py: _address) = socket.getaddrinfo(address, port, __family, 0, __proto)[0]
./utils/__init__.py: hostnames = socket.getaddrinfo(
./utils/network.py: family, socktype, proto, canonname, sockaddr = socket.getaddrinfo(
./utils/network.py: # try socket.getaddrinfo
./utils/network.py: addrinfo = socket.getaddrinfo(
./utils/network.py: # try socket.getaddrinfo
./utils/network.py: addrinfo = socket.getaddrinfo(
./utils/verify.py: hostnames = socket.getaddrinfo( |
@rvora, I think the appropriate solution is to at least call |
Thanks for your quick response. Thinking a bit more about this, since dns_check does not accept all the parameters getaddrinfo accepts, I think you kind of have to stick with using getaddrinfo for most of these situations. May be this is a good compromise. utils/__init__.py:
def resinit():
if HAS_RESINIT:
res_init()
def dns_check(.....):
resinit()
.... continue .... modules/network.py:
def connectI(....):
salt.utils.resinit()
..... continue .... |
@rvora, that seems like a good idea to me, you are welcome to create a pull request against 2015.5, since this would be a bug fix, thanks. |
There can be a race condition when the minion starts where it might start before the network is fully initialized (perhaps DHCP is slow or was down). In these cases, I've seen salt-minion stuck querying 127.0.0.1 for DNS. Even after the network does come online, it never picks up the changes because glibc caches the resolv.conf forever. Supposedly, debian/ubuntu and opensuse actually patch glibc to stat /etc/resolv.conf for changes (I have not verified this, just saw a reference online somewhere), but this doesn't happen on redhat. Without a fix in glibc, the current solution is to restart the salt-minion if the resolv.conf changes. However, salt could also trigger a res_init() to force glibc to reload the resolv.conf configuration.
I've tested the following by copying the dns_check() and hacking on it a bit. With the
res_init()
in there, it does pick up changes to resolv.conf:I tested by changing /etc/resolv.conf to point to a bogus nameserver, then during the loop, changing it back and it shows it able to correctly resolve the address.
Obviously, this isn't the correct patch (It's probably not portable and likely not running res_init() in the right places), but it is a workable solution to a very annoying problem of my minion spitting these out and never being able to fix itself (because it's looking for a nameserver at 127.0.0.1)
This might become more important if #15122 and/or #10032 is addressed.
I'm running 2014.7.2, fyi.
The text was updated successfully, but these errors were encountered: