Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNSMasq cache has low cache hit rate for some reason #160

Closed
felipejfc opened this issue Oct 26, 2017 · 11 comments
Closed

DNSMasq cache has low cache hit rate for some reason #160

felipejfc opened this issue Oct 26, 2017 · 11 comments

Comments

@felipejfc
Copy link

Hi, I'm running a k8s cluster in production with ~ 60 nodes, it has 870 pods on it.

Currently, 40 of these pods are kube-dns pods, each with 150m requests of CPU. (Yes, I had to scale it a lot)

I'm currently trying to figure out why I do need so much of them, one thing that I did today was to take a look at prometheus metrics exposed by the sidecar container and observed that:

image

The ratio between hits/total number of requests is too low (around 21%), any hints on why it is too low? seeing dnsmasq requests it seems that some of the addresses it is resolving are never being cached for some reason.

just for reference, this is the number of requests that kubedns is receiving per minute (I guess)

image

@johnbelamaric
Copy link
Member

A few questions:

  • Do you do a lot of queries to external resources (domains)?
  • Do you use FQDNs for your internal queries - for example "foo.namespace.svc.cluster.local" instead of just "foo"? If not, do you do a lot of cross-namespace lookups ("foo.ns2" from "ns1")?
  • Is negative caching (i.e., caching of NXDOMAIN) disabled in dnsmasq?

The way the client pod resolv.conf is set up, these things can lead to much higher query load.
See kubernetes/kubernetes#33554 for the engrossing details.

You could roll out CoreDNS (https://coredns.io) - see kubernetes/community#1100 for some details on numbers there (as a maintainer of that I am biased of course). We also have an auto path function in CoreDNS that could make a difference, depending on your answers to the questions above. There are some weird edge cases with that though (I am still working on the blog to make those clear; I do believe they are very unlikely cases for the most part).

@felipejfc
Copy link
Author

Hi @johnbelamaric

  • I do have a lot of queries to external resources (one thing I realised is that some of these queries are being cached (I can see [cached]... in dnsmasq logs) and some others are not)
  • Usually I use only service.namespace instead of service.namespace.svc.cluster.local
  • I guess no
      - args:
        - -v=2
        - -logtostderr
        - -configDir=/etc/k8s/dns/dnsmasq-nanny
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=64000
        - --log-facility=-
        - --server=/cluster.local/127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/in6.arpa/127.0.0.1#10053
        image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.6
        name: dnsmasq

@johnbelamaric
Copy link
Member

Ok, if it were disabled you would see -N or --no-negcache, so in fact it is enabled. But it may be that the TTL isn't being set by kube-dns so then it's not caching those. See https://linux.die.net/man/8/dnsmasq - you can try setting --neg-ttl.

Using service.namespace means the first query will always be NXDOMAIN from the search path, then the second query will work. So without negative caching that doubles the query load (but is necessary of course if things aren't in the same namespace).

@johnbelamaric
Copy link
Member

Even with negative caching it doubles the load - but they hit dnsmasq not kube-dns. This is what the auto path CoreDNS plugin is designed to fix - it figures out the path and returns a CNAME instead of NXDOMAIN from the first query.

@johnbelamaric
Copy link
Member

Also the caching of the external responses will help but only in the sense that the last of the 5 or so queries will hit that cache. For example, even if google.com that is only the 5th query in my cluster - all those other queries will go before it tries google.com as a FQDN:

dnstools# host -v google.com
Trying "google.com.default.svc.cluster.local"
Trying "google.com.svc.cluster.local"
Trying "google.com.cluster.local"
Trying "google.com.us-east-2.compute.internal"
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41452
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		60	IN	A	172.217.8.14

Received 44 bytes from 100.64.0.10#53 in 1 ms
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57226
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	AAAA

;; ANSWER SECTION:
google.com.		7	IN	AAAA	2607:f8b0:4004:802::200e

Received 56 bytes from 100.64.0.10#53 in 1 ms
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49632
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	MX

;; ANSWER SECTION:
google.com.		60	IN	MX	20 alt1.aspmx.l.google.com.
google.com.		60	IN	MX	30 alt2.aspmx.l.google.com.
google.com.		60	IN	MX	40 alt3.aspmx.l.google.com.
google.com.		60	IN	MX	50 alt4.aspmx.l.google.com.
google.com.		60	IN	MX	10 aspmx.l.google.com.

Received 136 bytes from 100.64.0.10#53 in 1 ms
dnstools#

@felipejfc
Copy link
Author

I see, setting neg-ttl helped a bit (cache hit rate is about 0.35, 0.4 now).

Is it better to always use the fqnd (service.ns.svc.cluster.local)?

Are there any configs I can do to optimize dnsmasq?

thanks @johnbelamaric

@felipejfc
Copy link
Author

felipejfc commented Oct 27, 2017

doing some queries it seems that its not good to use fqdn

# host -v podium.podium.svc.cluster.local
Trying "podium.podium.svc.cluster.local.battletanks.svc.cluster.local"
Trying "podium.podium.svc.cluster.local.svc.cluster.local"
Trying "podium.podium.svc.cluster.local.cluster.local"
Trying "podium.podium.svc.cluster.local.ec2.internal"
Trying "podium.podium.svc.cluster.local"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30371
# host -v podium.podium
Trying "podium.podium.battletanks.svc.cluster.local"
Trying "podium.podium.svc.cluster.local"

wtf...

edit:

this seems to be the better way:

# host -v podium.podium.svc.cluster.local.
Trying "podium.podium.svc.cluster.local"

@bowei
Copy link
Member

bowei commented Oct 27, 2017

An FQDN MUST end with a ".", e.g. foo.bar.com.

Anything else may be subject to search path expansion (look at the manpage for /etc/resolv.conf and search for "ndots")

@felipejfc
Copy link
Author

Fyi, using only fqdns made my cache hit rate way bigger, I've also allocated more memory for dnsmasq pod and set cache size to 64k, I was able to go from 40 kube-dns pods to only 10 doing that (I think I could set it even lower).

@klausenbusk
Copy link

But it may be that the TTL isn't being set by kube-dns so then it's not caching those. See https://linux.die.net/man/8/dnsmasq - you can try setting --neg-ttl.

I just tried specifying --neg-ttl=30 to reduce latency, but it seems like dnsmasq can't cache NXDOMAIN response from upstream servers. Same behaviour also seen here: https://serverfault.com/questions/827207/dnsmasq-not-caching-for-non-public-dns-servers

@discordianfish
Copy link

I also observe this. My cache hit rate is ~20%, event though I always resolve the same rarely changing names. I actually had --no-negcache set but removing it didn't affect the cache hit ratio, neither had increasing the cache limit nor setting --neg-ttl=30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants