Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.1 with hostdns and forwardKubeDNSToHost doesn't resolve anything #8698

Closed
evanrich opened this issue May 3, 2024 · 16 comments · Fixed by #8737
Closed

1.7.1 with hostdns and forwardKubeDNSToHost doesn't resolve anything #8698

evanrich opened this issue May 3, 2024 · 16 comments · Fixed by #8737

Comments

@evanrich
Copy link

evanrich commented May 3, 2024

Bug Report

Description

This is on a cluster that has been upgraded (1.6.x->1.7.x), not fresh

after applying the following patch:

machine:
  features:
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: true

nothing seems to resolve dns, either in the cluster or externally

getupstream:

talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 get dnsupstream
NODE           NAMESPACE   TYPE          ID            VERSION   HEALTHY   ADDRESS
192.168.5.10   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.11   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.12   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.15   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53

resolv.conf

 talosctl -n 192.168.5.10 read /system/resolved/resolv.conf
nameserver 10.96.0.9
talosctl -n 192.168.5.10 read /etc/resolv.conf
nameserver 127.0.0.53

resolvers

 talosctl -n 192.168.5.10 get resolvers
NODE           NAMESPACE   TYPE             ID          VERSION   RESOLVERS
192.168.5.10   network     ResolverStatus   resolvers   2         ["192.168.5.1"]

CoreDNS was restarted twice after applying the patch.

Logs

[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.2.33:48571->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37010 - 44799 "AAAA IN radarr.media.svc. udp 34 false 512" - - 0 2.001171487s
[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.0.228:41133->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37010 - 44353 "A IN radarr.media.svc. udp 34 false 512" - - 0 2.001187098s
[ERROR] plugin/errors: 2 radarr.media.svc. A: read udp 10.244.0.228:49164->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.30:55153 - 65462 "AAAA IN sonarr.media.svc. udp 34 false 512" - - 0 2.001133409s
[INFO] 10.244.0.30:55153 - 65275 "A IN sonarr.media.svc. udp 34 false 512" - - 0 2.001014136s
[ERROR] plugin/errors: 2 sonarr.media.svc. A: read udp 10.244.2.33:38186->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 sonarr.media.svc. AAAA: read udp 10.244.2.33:57661->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.16:57244 - 50161 "AAAA IN api.allegion.yonomi.cloud. udp 43 false 512" - - 0 2.001230715s
[ERROR] plugin/errors: 2 api.allegion.yonomi.cloud. AAAA: read udp 10.244.0.228:33430->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.16:57244 - 49553 "A IN api.allegion.yonomi.cloud. udp 43 false 512" - - 0 2.001237302s
[ERROR] plugin/errors: 2 api.allegion.yonomi.cloud. A: read udp 10.244.0.228:47070->10.96.0.9:53: i/o timeout
[INFO] 10.244.1.242:47829 - 1031 "AAAA IN api.doppler.com. udp 44 false 1232" - - 0 2.001031405s
[INFO] 10.244.1.242:50138 - 44842 "A IN api.doppler.com. udp 44 false 1232" - - 0 2.001066446s
[ERROR] plugin/errors: 2 api.doppler.com. AAAA: read udp 10.244.0.228:52401->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 api.doppler.com. A: read udp 10.244.0.228:48637->10.96.0.9:53: i/o timeout

Environment

  • Talos version: 1.7.1
  • Kubernetes version: 1.30.0
  • Platform: dell 5060/7060 nodes

Reverting the patch (false/false) fixes dns again.

FWIW, here's my coredns configmap:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    log . {
        class error
    }
    prometheus :9153

    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
@evanrich
Copy link
Author

evanrich commented May 3, 2024

coredns graphs go through the roof as well
image

@DmitriyMV
Copy link
Member

Greetings! Can you provide talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 logs dns-resolve-cache output?

@evanrich
Copy link
Author

evanrich commented May 5, 2024

Greetings! Can you provide talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 logs dns-resolve-cache output?

sure! with

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: false

i get ~13k lines, here's the last few:

192.168.5.12: 2024-05-05T19:15:00.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 27405\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 27405\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:20.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 30173\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 30173\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:40.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26124\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26124\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:00.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44019\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44019\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:20.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26814\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26814\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:40.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44389\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44389\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:00.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59770\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59770\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:20.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20152\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20152\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:40.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 43480\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 43480\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}

with

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: true

I get

192.168.5.10: 2024-05-05T19:20:52.012Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 13650\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:20:52.012Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 45522\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:20:52.013Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NXDOMAIN, id: 13650\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t A\n\n;; AUTHORITY SECTION:\n.\t1800\tIN\tSOA\ta.root-servers.net. nstld.verisign-grs.com. 2024050501 1800 900 604800 86400\n"}
192.168.5.10: 2024-05-05T19:20:52.013Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NXDOMAIN, id: 45522\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\n.\t1800\tIN\tSOA\ta.root-servers.net. nstld.verisign-grs.com. 2024050501 1800 900 604800 86400\n"}
192.168.5.10: 2024-05-05T19:21:01.928Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61250\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:01.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61250\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.682Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61066\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.682Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37810\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59884\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 35319\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61066\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\ndomain.io.\t1710\tIN\tSOA\trose.ns.cloudflare.com. dns.cloudflare.com. 2340201800 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37810\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nradarr.domain.io.\t270\tIN\tA\t10.10.5.30\n"}
192.168.5.10: 2024-05-05T19:21:02.686Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 35319\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nsonarr.domain.io.\t5\tIN\tA\t10.10.5.30\n"}
192.168.5.10: 2024-05-05T19:21:02.686Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59884\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\ndomain.io.\t1710\tIN\tSOA\trose.ns.cloudflare.com. dns.cloudflare.com. 2340201800 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:12.216Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37590\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;.\tIN\t NS\n"}
192.168.5.10: 2024-05-05T19:21:12.217Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37590\n;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;.\tIN\t NS\n\n;; ANSWER SECTION:\n.\t3600\tIN\tNS\ta.root-servers.net.\n.\t3600\tIN\tNS\tb.root-servers.net.\n.\t3600\tIN\tNS\tc.root-servers.net.\n.\t3600\tIN\tNS\td.root-servers.net.\n.\t3600\tIN\tNS\te.root-servers.net.\n.\t3600\tIN\tNS\tf.root-servers.net.\n.\t3600\tIN\tNS\tg.root-servers.net.\n.\t3600\tIN\tNS\th.root-servers.net.\n.\t3600\tIN\tNS\ti.root-servers.net.\n.\t3600\tIN\tNS\tj.root-servers.net.\n.\t3600\tIN\tNS\tk.root-servers.net.\n.\t3600\tIN\tNS\tl.root-servers.net.\n.\t3600\tIN\tNS\tm.root-servers.net.\n"}
192.168.5.10: 2024-05-05T19:21:19.651Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20589\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:19.651Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26931\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:19.652Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20589\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\ns3.domain.io.\t296\tIN\tA\t104.21.30.117\ns3.domain.io.\t296\tIN\tA\t172.67.172.226\n"}
192.168.5.10: 2024-05-05T19:21:19.652Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26931\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t AAAA\n\n;; ANSWER SECTION:\ns3.domain.io.\t296\tIN\tAAAA\t2606:4700:3035::6815:1e75\ns3.domain.io.\t296\tIN\tAAAA\t2606:4700:3037::ac43:ace2\n"}
192.168.5.10: 2024-05-05T19:21:21.928Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37418\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:21.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37418\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:30.483Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 41265\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:30.483Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 3690\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:30.484Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 41265\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t A\n\n;; ANSWER SECTION:\nplex.tv.\t30\tIN\tA\t34.243.94.189\nplex.tv.\t30\tIN\tA\t34.241.88.179\n"}
192.168.5.10: 2024-05-05T19:21:30.484Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 3690\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\nplex.tv.\t207\tIN\tSOA\tjeremy.ns.cloudflare.com. dns.cloudflare.com. 2340420772 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:41.927Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 38917\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:41.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 38917\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:45.777Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 29216\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:45.778Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 29216\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nsonarr.domain.io.\t257\tIN\tA\t10.10.5.30\n"}

As soon as the patch is applied and coredns restarted, I start immediately seeing issues, for example in my homeassistant logs:

 (SyncWorker_14) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:21:37.684 WARNING (SyncWorker_3) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available
2024-05-05 12:22:07.685 WARNING (SyncWorker_4) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:22:07.687 WARNING (SyncWorker_50) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available
2024-05-05 12:22:37.690 WARNING (SyncWorker_46) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:22:37.693 WARNING (SyncWorker_15) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available

and from the coredns deployment logs itself:

ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:42921->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37176 - 46031 "A IN radarr.media.svc. udp 34 false 512" - - 0 2.000961528s
[INFO] 10.244.0.142:37176 - 46771 "AAAA IN radarr.media.svc. udp 34 false 512" - - 0 2.000981288s
[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.0.58:39689->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 radarr.media.svc. A: read udp 10.244.0.58:56654->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:34045 - 18857 "AAAA IN sonarr.media.svc. udp 34 false 512" - - 0 2.0010946020000002s
[ERROR] plugin/errors: 2 sonarr.media.svc. AAAA: read udp 10.244.3.183:57222->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:34045 - 18443 "A IN sonarr.media.svc. udp 34 false 512" - - 0 2.001069037s
[ERROR] plugin/errors: 2 sonarr.media.svc. A: read udp 10.244.3.183:60187->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.171:33221 - 59865 "A IN s3.domain.io. udp 33 false 512" - - 0 2.001200777s
[INFO] 10.244.0.171:33221 - 17887 "AAAA IN s3.domain.io. udp 33 false 512" - - 0 2.001220341s
[ERROR] plugin/errors: 2 s3.domain.io. A: read udp 10.244.3.183:42636->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 s3.domain.io. AAAA: read udp 10.244.3.183:51402->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38535 "AAAA IN ps.pndsn.com. udp 30 false 512" - - 0 2.00101046s
[ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:46826->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38373 "A IN ps.pndsn.com. udp 30 false 512" - - 0 2.001172459s
[ERROR] plugin/errors: 2 ps.pndsn.com. A: read udp 10.244.3.183:49263->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.213:33246 - 976 "A IN github.com. udp 39 false 1232" - - 0 2.00063978s
[ERROR] plugin/errors: 2 github.com. A: read udp 10.244.3.183:57869->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.213:41944 - 26300 "AAAA IN github.com. udp 39 false 1232" - - 0 2.001602828s
[ERROR] plugin/errors: 2 github.com. AAAA: read udp 10.244.0.58:52988->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38535 "AAAA IN ps.pndsn.com. udp 30 false 512" - - 0 2.000211697s
[ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:48895->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38373 "A IN ps.pndsn.com. udp 30 false 512" - - 0 2.000241596s
[ERROR] plugin/errors: 2 ps.pndsn.com. A: read udp 10.244.3.183:41034->10.96.0.9:53: i/o timeout

changing forwardKubeDNSToHost: true back to false brings things back to normal. I can post my machine config if that helps but don't have anything too crazy there. upon restarting the coredns deployment, the logs are clean again:

.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2

@chrxmvtik
Copy link

Same issue for me, but unfortunately disabling hostDNS features doesn't resolve the issue.

I am using my own DNS servers, however using public DNS servers didn't help.

It worked fine using version 1.6.7, failed to work from 1.7.0, keeps failing in 1.7.1.

@smira
Copy link
Member

smira commented May 7, 2024

It worked fine using version 1.6.7, failed to work from 1.7.0, keeps failing in 1.7.1.

Let's not mix different issues in one ticket please.

@smira smira changed the title 1.7.1 with hostdns doesn't resolve anything 1.7.1 with hostdns and forwardKubeDNSToHost doesn't resolve anything May 7, 2024
@smira
Copy link
Member

smira commented May 7, 2024

@evanrich what is the CNI you're using?

@evanrich
Copy link
Author

evanrich commented May 7, 2024

@evanrich what is the CNI you're using?

Cilium v1.15.4

@MathiasPius
Copy link

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4.

I am using DHCP-discovered public DNS servers run by Hetzner.

Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

@pau-campana
Copy link

I have the same error. I'm using talos v1.7.1 and cilium v1.14.7

@chrxmvtik
Copy link

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4.

I am using DHCP-discovered public DNS servers run by Hetzner.

Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

Check if you are using bpf.masquerade if yes and you did not specify CIDRs manually, then with common private CIDRs you will get above error.

Try to set bpf.masquerade option to false and check if that works.

@MathiasPius
Copy link

MathiasPius commented May 14, 2024

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4.
I am using DHCP-discovered public DNS servers run by Hetzner.
Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

Check if you are using bpf.masquerade if yes and you did not specify CIDRs manually, then with common private CIDRs you will get above error.

Try to set bpf.masquerade option to false and check if that works.

Sounds very plausible. However, bpf masquerade is disabled for my use case, but I can see that iptables masquerade for ipv4 is enabled. I would assume disabling this would have the same effect?

Edit:
I disabled all masquerading:

$ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep Masquerading
Masquerading:            Disabled

But I'm still seeing the exact same issue. I am now seeing the issue with the public IP address of the DNS Server instead.

It seems to me that masquerading is a very likely culprit, but I'm not sure how exactly yet. Will keep digging.

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
…or pods

This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.
It also enables by default the usage of our host DNS resolver as upstream for Kubernetes CoreDNS pods.

Credits go to Julian Wiedmann.
For siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
…or pods

This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.
It also enables by default the usage of our host DNS resolver as upstream for Kubernetes CoreDNS pods.

Credits go to Julian Wiedmann.
For siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
…or pods

This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.
It also enables by default the usage of our host DNS resolver as
upstream for Kubernetes CoreDNS pods.

Credits go to Julian Wiedmann.
For siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
…or pods

This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.
It also enables by default the usage of our host DNS resolver as
upstream for Kubernetes CoreDNS pods.

Credits go to Julian Wiedmann.
For siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
…or pods

This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.
It also enables by default the usage of our host DNS resolver as
upstream for Kubernetes CoreDNS pods.

Credits go to Julian Wiedmann.
For siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 14, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 15, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 15, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@smira
Copy link
Member

smira commented May 15, 2024

The fix is coming, thanks for reporting it, it's indeed the TTL. It's only related to fowardKubeDNSToHost option which is not enabled by default in Talos 1.7 (only enabled for Docker-based clusters).

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 15, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@DmitriyMV
Copy link
Member

Reopened until there is 1.7 backport.

@DmitriyMV DmitriyMV reopened this May 15, 2024
smira pushed a commit to smira/talos that referenced this issue May 17, 2024
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes siderolabs#8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit 53f5489)
@DmitriyMV
Copy link
Member

Closed per #8758

@evanrich
Copy link
Author

evanrich commented May 17, 2024

@DmitriyMV not sure if this is related, but after upgrading 1.7.1->1.7.2, while better, I now see other errors:

.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[INFO] 10.244.0.100:34471 - 62223 "AAAA IN registry.npmjs.org. udp 36 false 512" - - 0 5.00010545s
[ERROR] plugin/errors: 2 registry.npmjs.org. AAAA: dns: buffer size too small
[INFO] 10.244.0.23:42181 - 51958 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000088228s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000083312s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55229 "A IN api.ring.com. udp 30 false 512" - - 0 5.000134008s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55229 "A IN api.ring.com. udp 30 false 512" - - 0 5.000123943s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000255962s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000144994s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000402892s
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000479894s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000031736s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000118852s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000074868s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000111692s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size

this is based off the following config:

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: true

The only thing i flipped from 1.7.1. to 1.7.2 was the forward to host.

@evanrich
Copy link
Author

1.7.3 fixes the errors above

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants