DNSMasq cache has low cache hit rate for some reason #160

felipejfc · 2017-10-26T21:42:43Z

Hi, I'm running a k8s cluster in production with ~ 60 nodes, it has 870 pods on it.

Currently, 40 of these pods are kube-dns pods, each with 150m requests of CPU. (Yes, I had to scale it a lot)

I'm currently trying to figure out why I do need so much of them, one thing that I did today was to take a look at prometheus metrics exposed by the sidecar container and observed that:

The ratio between hits/total number of requests is too low (around 21%), any hints on why it is too low? seeing dnsmasq requests it seems that some of the addresses it is resolving are never being cached for some reason.

just for reference, this is the number of requests that kubedns is receiving per minute (I guess)

johnbelamaric · 2017-10-27T14:28:14Z

A few questions:

Do you do a lot of queries to external resources (domains)?
Do you use FQDNs for your internal queries - for example "foo.namespace.svc.cluster.local" instead of just "foo"? If not, do you do a lot of cross-namespace lookups ("foo.ns2" from "ns1")?
Is negative caching (i.e., caching of NXDOMAIN) disabled in dnsmasq?

The way the client pod resolv.conf is set up, these things can lead to much higher query load.
See kubernetes/kubernetes#33554 for the engrossing details.

You could roll out CoreDNS (https://coredns.io) - see kubernetes/community#1100 for some details on numbers there (as a maintainer of that I am biased of course). We also have an auto path function in CoreDNS that could make a difference, depending on your answers to the questions above. There are some weird edge cases with that though (I am still working on the blog to make those clear; I do believe they are very unlikely cases for the most part).

felipejfc · 2017-10-27T15:21:34Z

Hi @johnbelamaric

I do have a lot of queries to external resources (one thing I realised is that some of these queries are being cached (I can see [cached]... in dnsmasq logs) and some others are not)
Usually I use only service.namespace instead of service.namespace.svc.cluster.local
I guess no

      - args:
        - -v=2
        - -logtostderr
        - -configDir=/etc/k8s/dns/dnsmasq-nanny
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=64000
        - --log-facility=-
        - --server=/cluster.local/127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/in6.arpa/127.0.0.1#10053
        image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.6
        name: dnsmasq

johnbelamaric · 2017-10-27T15:42:59Z

Ok, if it were disabled you would see -N or --no-negcache, so in fact it is enabled. But it may be that the TTL isn't being set by kube-dns so then it's not caching those. See https://linux.die.net/man/8/dnsmasq - you can try setting --neg-ttl.

Using service.namespace means the first query will always be NXDOMAIN from the search path, then the second query will work. So without negative caching that doubles the query load (but is necessary of course if things aren't in the same namespace).

johnbelamaric · 2017-10-27T15:44:46Z

Even with negative caching it doubles the load - but they hit dnsmasq not kube-dns. This is what the auto path CoreDNS plugin is designed to fix - it figures out the path and returns a CNAME instead of NXDOMAIN from the first query.

johnbelamaric · 2017-10-27T18:44:41Z

Also the caching of the external responses will help but only in the sense that the last of the 5 or so queries will hit that cache. For example, even if google.com that is only the 5th query in my cluster - all those other queries will go before it tries google.com as a FQDN:

dnstools# host -v google.com
Trying "google.com.default.svc.cluster.local"
Trying "google.com.svc.cluster.local"
Trying "google.com.cluster.local"
Trying "google.com.us-east-2.compute.internal"
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41452
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		60	IN	A	172.217.8.14

Received 44 bytes from 100.64.0.10#53 in 1 ms
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57226
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	AAAA

;; ANSWER SECTION:
google.com.		7	IN	AAAA	2607:f8b0:4004:802::200e

Received 56 bytes from 100.64.0.10#53 in 1 ms
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49632
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.			IN	MX

;; ANSWER SECTION:
google.com.		60	IN	MX	20 alt1.aspmx.l.google.com.
google.com.		60	IN	MX	30 alt2.aspmx.l.google.com.
google.com.		60	IN	MX	40 alt3.aspmx.l.google.com.
google.com.		60	IN	MX	50 alt4.aspmx.l.google.com.
google.com.		60	IN	MX	10 aspmx.l.google.com.

Received 136 bytes from 100.64.0.10#53 in 1 ms
dnstools#

felipejfc · 2017-10-27T18:48:17Z

I see, setting neg-ttl helped a bit (cache hit rate is about 0.35, 0.4 now).

Is it better to always use the fqnd (service.ns.svc.cluster.local)?

Are there any configs I can do to optimize dnsmasq?

thanks @johnbelamaric

felipejfc · 2017-10-27T18:54:12Z

doing some queries it seems that its not good to use fqdn

# host -v podium.podium.svc.cluster.local
Trying "podium.podium.svc.cluster.local.battletanks.svc.cluster.local"
Trying "podium.podium.svc.cluster.local.svc.cluster.local"
Trying "podium.podium.svc.cluster.local.cluster.local"
Trying "podium.podium.svc.cluster.local.ec2.internal"
Trying "podium.podium.svc.cluster.local"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30371

# host -v podium.podium
Trying "podium.podium.battletanks.svc.cluster.local"
Trying "podium.podium.svc.cluster.local"

wtf...

edit:

this seems to be the better way:

# host -v podium.podium.svc.cluster.local.
Trying "podium.podium.svc.cluster.local"

bowei · 2017-10-27T19:51:47Z

An FQDN MUST end with a ".", e.g. foo.bar.com.

Anything else may be subject to search path expansion (look at the manpage for /etc/resolv.conf and search for "ndots")

felipejfc · 2017-10-31T02:11:05Z

Fyi, using only fqdns made my cache hit rate way bigger, I've also allocated more memory for dnsmasq pod and set cache size to 64k, I was able to go from 40 kube-dns pods to only 10 doing that (I think I could set it even lower).

klausenbusk · 2017-11-11T13:24:44Z

But it may be that the TTL isn't being set by kube-dns so then it's not caching those. See https://linux.die.net/man/8/dnsmasq - you can try setting --neg-ttl.

I just tried specifying --neg-ttl=30 to reduce latency, but it seems like dnsmasq can't cache NXDOMAIN response from upstream servers. Same behaviour also seen here: https://serverfault.com/questions/827207/dnsmasq-not-caching-for-non-public-dns-servers

discordianfish · 2018-05-21T12:54:11Z

I also observe this. My cache hit rate is ~20%, event though I always resolve the same rarely changing names. I actually had --no-negcache set but removing it didn't affect the cache hit ratio, neither had increasing the cache limit nor setting --neg-ttl=30.

felipejfc closed this as completed Oct 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNSMasq cache has low cache hit rate for some reason #160

DNSMasq cache has low cache hit rate for some reason #160

felipejfc commented Oct 26, 2017

johnbelamaric commented Oct 27, 2017

felipejfc commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

felipejfc commented Oct 27, 2017

felipejfc commented Oct 27, 2017 •

edited

Loading

bowei commented Oct 27, 2017

felipejfc commented Oct 31, 2017

klausenbusk commented Nov 11, 2017

discordianfish commented May 21, 2018

DNSMasq cache has low cache hit rate for some reason #160

DNSMasq cache has low cache hit rate for some reason #160

Comments

felipejfc commented Oct 26, 2017

johnbelamaric commented Oct 27, 2017

felipejfc commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

johnbelamaric commented Oct 27, 2017

felipejfc commented Oct 27, 2017

felipejfc commented Oct 27, 2017 • edited Loading

bowei commented Oct 27, 2017

felipejfc commented Oct 31, 2017

klausenbusk commented Nov 11, 2017

discordianfish commented May 21, 2018

felipejfc commented Oct 27, 2017 •

edited

Loading