Wrong CNAME reply time calculation (too fast) #1232

r2ixuz · 2021-10-27T17:04:20Z

For example if I resolve the A record for "alive.github.com" it points to an CNAME record. Which is shown in the Pi-hole reply column as too fast: 13ms in this example where alone the RTT to Google DNS is for me at its best 24ms.

This bug is new since "Docker Tag 2021.10.1 Pi-hole v5.6 FTL v5.11 Web Interface v5.8".

DL6ER · 2021-10-27T18:25:39Z

13ms in this example where alone the RTT to Google DNS is for me at its best 24ms.

How did you measure the RTT to Google's DNS server? As ping does not utilize UDP but ICMP, there might be a difference. We've seen ISP caching some DNS stuff on their own so - while a ping would indeed go to the Google servers - the known DNS query may in fact be answered a lot closer to you.

Have you verified this against a few manual dig queries?

We can enable the PCAP tracing embedded in Pi-hole if you want to investigate with us further.

For this, create a new file /etc/dnsmasq.d/99-pcap.conf with the following content:

dumpfile=/tmp/record.pcap
dumpmask=0xFFFF

When you restart pihole-FTL, it will immediately start recording. You can either share your recording here or we discuss further steps so you can analyze your file offline and we can compare the measured time here to the time displayed in the Query Log.

r2ixuz · 2021-10-27T18:51:45Z

I know how the reply should be normally. Besides I use Smokeping to test multiple DNS dig probes against the upstreams I use.

Furthermore I did several dig test over the last couple of months and no one was faster than 20ms. And I can see this behaviour only since the last stable update as mentioned before. There are even sometimes responses under 10ms...

This behaviour occurs ONLY for CNAME response. A records does have normal repsonse times.

r2ixuz · 2021-10-27T19:04:47Z

<<>> DiG 9.16.15 <<>> acme-v02.api.letsencrypt.org @dns2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32383
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;acme-v02.api.letsencrypt.org.	IN	A

;; ANSWER SECTION:
acme-v02.api.letsencrypt.org. 7017 IN	CNAME	prod.api.letsencrypt.org.
prod.api.letsencrypt.org. 117	IN	CNAME	ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com. 117 IN A 172.65.32.248

;; Query time: 26 msec

r2ixuz · 2021-10-27T19:13:22Z

Please don't bother with the different server names "dns2" and "Google". Both are the same local DoH instance which points only to Google DNS. On Pi-hole I use a "Local DNS Record" which points "dns2" IP to "Google" so it is better understandable in the Query Log.

DL6ER · 2021-10-27T19:32:35Z

Long-lived (and cached) CNAMEs with short-lived (and already expired) children need to be forwarded partially to get the final reply. Pi-hole measures the reply time from receiving the query from the client until the cached record was found. Then we start again measuring the delay of the new query (for the short-lived query) to the upstream. Finally, both are added and give the overall delay. This seems to be broken somehow in your case.

If you could do the pcap-recording, we'd exactly see how many milliseconds there are between

query received from client
sub-query sent upstream
sub-query received from upstream
complete reply sent to the client

and this should reveal what we are looking at here.

r2ixuz · 2021-10-27T19:42:27Z

I think it has something todo with your last change in this stable release regarding CNAME deep inspection and my dnsmasq setting "all-servers". The other DNS upstream I use is normally way faster than the Google one. Maybe due to that it is getting mixed somehow.

How can I interpret this pcap file myself. For data privacy reasons I would not like to upload it, sorry.

yubiuser · 2021-10-27T19:43:16Z

How can I interpret this pcap file myself.

https://www.wireshark.org/

DL6ER · 2021-10-27T20:12:01Z

Another idea is adding DEBUG_QUERIES=true to /etc/pihole/pihole-FTL.conf (create if it does not exist). After a restart, query details will be logged in /var/log/pihole-FTL.log with millisecond-accurate timestamps (even if the 100µs digit is missing).

How can I interpret this pcap file myself.

https://www.wireshark.org/

You can also check out tcpdump if you prefer a terminal application. But it is a lot harder to use this one compared to the graphical tool suggested by @yubiuser

r2ixuz · 2021-10-27T20:13:16Z

Hm, I can see only "Standard query response" at Wireshark for one suspicious "too fast response" which comes from the usually faster other upstream. So the Google DNS does not appear with a response in Wireshark for that query even though the Pi-hole's Query Log states that Google DNS was fastest.

So now I understand where Pi-hole is getting this fast response time from. I still think it has something todo with the "all-servers" setting.

DL6ER · 2021-10-27T20:14:10Z

So now I understand where Pi-hole is getting this fast response time from. I still think it has something todo with the "all-servers" setting.

But why shouldn't this be correct?

r2ixuz · 2021-10-27T20:17:31Z

Because when using the "all-servers" setting over the past years Pi-hole was not able to show which one was fastest of my both upstreams. It showed always the same upstream.

Pi-hole is only to able to make difference since, let me guess, the last 2 or 3 stable releases.

DL6ER · 2021-10-27T20:18:45Z

Yes, you are right. This was a change in one of the more recent releases (but not this one). Case closed?

r2ixuz · 2021-10-27T20:20:48Z

Well no, because the shown server is still wrong for CNAMEs. Maybe I was not clear enough. I understand that the time is being shown is correct, but Pi-hole still shows the wrong upstream name/IP in its query log in such a case.

DL6ER · 2021-10-27T20:26:03Z

Okay, so this is a different problem.

It'll be helpful if you could isolate a CNAME where this went wrong in the pihole-FTL.log file (with debug logging enabled, see above) and provide it here. You are obviously free to obfuscate IP addresses due to privacy reasons, however, please mark somehow what was obfuscated.

r2ixuz · 2021-10-27T20:33:03Z

[2021-10-27 22:26:12.805 25051M] **** new UDP IPv6 query[A] query "acme-v02.api.letsencrypt.org" from eth0:fd00::2e91:abff:fe26:e911#36597 (ID 53, FTL 30840, src/dnsmasq/forward.c:1575)
[2021-10-27 22:26:12.805 25051M] acme-v02.api.letsencrypt.org is not known
[2021-10-27 22:26:12.805 25051M] DNS cache: fd00::2e91:abff:fe26:e911/acme-v02.api.letsencrypt.org is not blocked
[2021-10-27 22:26:12.805 25051M] **** got cache reply: acme-v02.api.letsencrypt.org is (CNAME) (ID 53, src/dnsmasq/rfc1035.c:1502)
[2021-10-27 22:26:12.805 25051M] Set reply to CNAME (3) in src/dnsmasq_interface.c:2039
[2021-10-27 22:26:12.805 25051M] **** forwarded acme-v02.api.letsencrypt.org to 192.168.1.251#53 (ID 53, src/dnsmasq/forward.c:520)
[2021-10-27 22:26:12.805 25051M] **** forwarded acme-v02.api.letsencrypt.org to 192.168.1.252#53 (ID 53, src/dnsmasq/forward.c:520)
[2021-10-27 22:26:12.816 25051M] FTL_CNAME called with: src = (null), dst = acme-v02.api.letsencrypt.org, id = 53
[2021-10-27 22:26:12.816 25051M] acme-v02.api.letsencrypt.org is known as not to be blocked
[2021-10-27 22:26:12.816 25051M] Query 53: CNAME acme-v02.api.letsencrypt.org
[2021-10-27 22:26:12.816 25051M] **** got upstream reply from 192.168.1.251#53: acme-v02.api.letsencrypt.org is (CNAME) (ID 53, src/dnsmasq/rfc1035.c:782)
[2021-10-27 22:26:12.816 25051M] FTL_CNAME called with: src = acme-v02.api.letsencrypt.org, dst = prod.api.letsencrypt.org, id = 53
[2021-10-27 22:26:12.817 25051M] prod.api.letsencrypt.org is not known
[2021-10-27 22:26:12.817 25051M] DNS cache: fd00::2e91:abff:fe26:e911/prod.api.letsencrypt.org is not blocked
[2021-10-27 22:26:12.817 25051M] Query 53: CNAME acme-v02.api.letsencrypt.org ---> prod.api.letsencrypt.org
[2021-10-27 22:26:12.817 25051M] **** got upstream reply from 192.168.1.251#53: prod.api.letsencrypt.org is (CNAME) (ID 53, src/dnsmasq/rfc1035.c:782)
[2021-10-27 22:26:12.817 25051M] FTL_CNAME called with: src = prod.api.letsencrypt.org, dst = ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com, id = 53
[2021-10-27 22:26:12.817 25051M] ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com is not known
[2021-10-27 22:26:12.817 25051M] DNS cache: fd00::2e91:abff:fe26:e911/ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com is not blocked
[2021-10-27 22:26:12.817 25051M] Query 53: CNAME prod.api.letsencrypt.org ---> ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com
[2021-10-27 22:26:12.817 25051M] **** got upstream reply from 192.168.1.251#53: ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com is 172.65.32.248 (ID 53, src/dnsmasq/rfc1035.c:893)

So here you can see: Clearly dns1/192.168.1.251/Cloudflare is answering all queries but the Query Log shows that dns2/192.168.1.252/Google answered fastest which is wrong.

DL6ER · 2021-10-27T20:52:38Z

Thanks for the log snippet, this revealed where the problem is: The check if another upstream replied than the last one we forwarded a query to is skipped when the reply type is already known. This is the case here due to

[2021-10-27 22:26:12.805 25051M] Set reply to CNAME (3) in src/dnsmasq_interface.c:2039

This should be fixed by #1235

I triggered building a special docker container you could use to check if the bugfix is sufficient for you (tag fix_cached_cname_upstream_dest, it needs some time to be built until it can be downloaded). The fix for the other bug is already contained therein.

r2ixuz · 2021-10-27T20:58:25Z

Thank you. I will test it tomorrow.

r2ixuz · 2021-10-28T13:44:16Z

This Docker Image fixed it. Great job, thanks!

PromoFaux · 2021-10-28T14:26:48Z

Reopening as this bug is not yet fixed. Please keep in mind that docker tag will not be updated

awaiting merge of #1235

r2ixuz · 2021-10-28T17:50:22Z

@DL6ER Is it possible that this fix is also needed for SERVFAIL responses when using "all-servers"? Seems to be the case too. Doesn't really bother me, just for your information.

DL6ER · 2021-10-28T18:01:18Z

It's always needed with all-servers.

r2ixuz · 2021-10-28T18:05:29Z

With the Docker image you provided for me I have the same "problem" for SERVFAIL responses like it was with CNAMES before your fix. So SERVFAIL responses have wrong response times (too fast) displayed in the Query Log. Your fix does not work there.

DL6ER · 2021-10-28T19:01:23Z

Thanks for testing and mentioning this. Upstream error handling is differently than successful reply handling and, in the former case, detecting which server was the earliest is missing so queries always show the list server that we sent the query to. I'll check this out over the next days and come back to you

DL6ER · 2021-10-31T09:14:17Z

@r2ixuz The custom docker tag fix_cached_cname_upstream_dest will be updated in about half an hour with another fix (#1237) which should also cover the SERVFAIL/REFUSED/... replies from multiple upstreams.

r2ixuz · 2021-10-31T10:22:36Z

Looks good. Your fix is working! Thanks.

DL6ER · 2021-12-22T20:33:47Z

The next version of FTL has been released. Please update and run

pihole checkout master

to get back on-track. The fix/feature branch you switched to will not receive any further updates.

Thanks for helping us to make Pi-hole better for us all!

If you have any issues, please either reopen this ticket or (preferably) create a new ticket describing the issues in further detail and only reference this ticket. This will help us to help you best.

yubiuser transferred this issue from pi-hole/web Oct 27, 2021

DL6ER added the Discussion label Oct 27, 2021

DL6ER mentioned this issue Oct 27, 2021

Tweak to determination of fastest upstream #1235

Merged

5 tasks

DL6ER added Bug Bugfix in progress and removed Discussion labels Oct 27, 2021

r2ixuz closed this as completed Oct 28, 2021

PromoFaux reopened this Oct 28, 2021

DL6ER mentioned this issue Oct 31, 2021

Analyze error report origins #1237

Merged

5 tasks

DL6ER added Fixed in next release and removed Bugfix in progress labels Dec 2, 2021

DL6ER closed this as completed Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong CNAME reply time calculation (too fast) #1232

Wrong CNAME reply time calculation (too fast) #1232

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021 •

edited

Loading

r2ixuz commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

yubiuser commented Oct 27, 2021

DL6ER commented Oct 27, 2021 •

edited

Loading

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021 •

edited

Loading

r2ixuz commented Oct 27, 2021

r2ixuz commented Oct 28, 2021

PromoFaux commented Oct 28, 2021

r2ixuz commented Oct 28, 2021

DL6ER commented Oct 28, 2021

r2ixuz commented Oct 28, 2021

DL6ER commented Oct 28, 2021

DL6ER commented Oct 31, 2021

r2ixuz commented Oct 31, 2021

DL6ER commented Dec 22, 2021

Wrong CNAME reply time calculation (too fast) #1232

Wrong CNAME reply time calculation (too fast) #1232

Comments

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021 • edited Loading

r2ixuz commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

yubiuser commented Oct 27, 2021

DL6ER commented Oct 27, 2021 • edited Loading

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021

r2ixuz commented Oct 27, 2021

DL6ER commented Oct 27, 2021 • edited Loading

r2ixuz commented Oct 27, 2021

r2ixuz commented Oct 28, 2021

PromoFaux commented Oct 28, 2021

r2ixuz commented Oct 28, 2021

DL6ER commented Oct 28, 2021

r2ixuz commented Oct 28, 2021

DL6ER commented Oct 28, 2021

DL6ER commented Oct 31, 2021

r2ixuz commented Oct 31, 2021

DL6ER commented Dec 22, 2021

r2ixuz commented Oct 27, 2021 •

edited

Loading

DL6ER commented Oct 27, 2021 •

edited

Loading

DL6ER commented Oct 27, 2021 •

edited

Loading