Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Fix for negative TTLs in WeaveDNS #501

Merged
merged 1 commit into from
Apr 13, 2015
Merged

Conversation

inercia
Copy link
Contributor

@inercia inercia commented Mar 30, 2015

Fixes #494 by using a different lookups set depending on the value stored in the cache.

PS: these changes have been tested with this new WeaveDNS test tool.

@awh awh self-assigned this Mar 31, 2015
@errordeveloper
Copy link
Contributor

I have just tested this and can confirm that it fixes #494. However, I am somehow seeing much slower responses once in a while, could this by any chance introduce performance regression that could cause a few seconds of delay to some lookups?

@errordeveloper
Copy link
Contributor

I have just redone the same test from scratch with one machine with clean WeaveDNS log.

docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.54s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 20.02s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 10.52s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 20.03s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 10.52s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ docker logs weavedns
INFO: 2015/03/31 16:39:38.074011 [updater] Using Docker API on unix:///var/run/docker.sock: &[GitCommit=a8a31ef GoVersion=go1.4.1 KernelVersion=3.18.5-tinycore64 Os=linux Version=1.5.0 ApiVersion=1.17 Arch=amd64]
INFO: 2015/03/31 16:39:38.074128 Waiting for interface ethwe to come up
INFO: 2015/03/31 16:39:39.074608 Interface ethwe is up
DEBUG: 2015/03/31 16:39:39.074738 [dns] Initializing cache: 8192 entries
DEBUG: 2015/03/31 16:39:39.075258 Starting the signals handler
INFO: 2015/03/31 16:39:39.075308 Using mDNS on ethwe
DEBUG: 2015/03/31 16:39:39.087202 Listening for DNS on :53 (UDP)
DEBUG: 2015/03/31 16:39:39.087674 Listening for DNS on :53 (TCP)
INFO: 2015/03/31 16:39:40.253160 [http] Ignoring name 8c50061d39c0.., not in weave.local.
INFO: 2015/03/31 16:39:40.708730 [updater] Container 0e4269878a6643caafa4093b0c27ee5e46416921d651343c94a9def9f9e18e68 down. Removing records
INFO: 2015/03/31 16:46:52.567269 [http] Adding hola.weave.local. -> 10.20.20.2
DEBUG: 2015/03/31 16:50:00.507697 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:00.507780 [cache msgid 42669] addind in pending state
DEBUG: 2015/03/31 16:50:00.507802 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:50:00.508562 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:50:00.508631 [mdns msgid 13426] No local answer for mDNS query 1.42.17.172.in-addr.arpa.
INFO: 2015/03/31 16:50:01.010502 [dns msgid 42669] No results for type PTR query 1.42.17.172.in-addr.arpa. [caching no-local]
DEBUG: 2015/03/31 16:50:01.010597 [cache msgid 42669] replacing response in cache
DEBUG: 2015/03/31 16:50:01.010657 [dns msgid 42669] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 16:50:01.032689 [dns msgid 42669] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 16:50:01.032721 [dns msgid 42669] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 16:50:01.037545 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 16:50:01.037610 [cache msgid 48229] addind in pending state
DEBUG: 2015/03/31 16:50:01.037631 [dns msgid 48229] Caching response for A-query for "hola.weave.local.": 10.20.20.2 [code:NOERROR]
DEBUG: 2015/03/31 16:50:01.037637 [cache msgid 48229] replacing response in cache
DEBUG: 2015/03/31 16:50:01.037834 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 16:50:01.037857 [dns msgid 54813] Unsuported query type AAAA
DEBUG: 2015/03/31 16:50:01.038852 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:01.038883 [cache msgid 18890] addind in pending state
DEBUG: 2015/03/31 16:50:01.038888 [zonedb] Looking for address: [10 20 20 2]
DEBUG: 2015/03/31 16:50:01.038924 [dns msgid 18890] Caching response for PTR-query for "2.20.20.10.in-addr.arpa.": hola.weave.local. [code:NOERROR]
DEBUG: 2015/03/31 16:50:01.038930 [cache msgid 18890] replacing response in cache
DEBUG: 2015/03/31 16:50:02.180216 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:02.180294 [dns msgid 17295] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:07.188403 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:07.188448 [dns msgid 17295] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:12.194123 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:12.194171 [dns msgid 17295] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:17.199326 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:17.199550 [dns msgid 17295] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:22.204698 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 16:50:22.204804 [dns msgid 4578] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 16:50:22.204927 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 16:50:22.204940 [dns msgid 58420] Unsuported query type AAAA
DEBUG: 2015/03/31 16:50:22.204949 [cache msgid 58420] replacing response in cache
DEBUG: 2015/03/31 16:50:22.206077 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:22.206144 [dns msgid 18852] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 16:50:24.476086 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:24.476206 [dns msgid 1392] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:29.486288 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:29.486357 [dns msgid 1392] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:34.486091 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:34.486184 [cache msgid 1392] expired: removing
DEBUG: 2015/03/31 16:50:34.486199 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:50:34.487271 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:50:34.487292 [mdns msgid 13998] No local answer for mDNS query 1.42.17.172.in-addr.arpa.
INFO: 2015/03/31 16:50:34.987018 [dns msgid 1392] No results for type PTR query 1.42.17.172.in-addr.arpa. [caching no-local]
DEBUG: 2015/03/31 16:50:34.987034 [dns msgid 1392] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 16:50:34.997103 [dns msgid 1392] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 16:50:34.997133 [dns msgid 1392] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 16:50:34.999672 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 16:50:34.999743 [cache msgid 6099] expired: removing
DEBUG: 2015/03/31 16:50:34.999761 [dns msgid 6099] Caching response for A-query for "hola.weave.local.": 10.20.20.2 [code:NOERROR]
DEBUG: 2015/03/31 16:50:34.999879 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 16:50:34.999888 [dns msgid 11830] Unsuported query type AAAA
DEBUG: 2015/03/31 16:50:34.999893 [cache msgid 11830] replacing response in cache
DEBUG: 2015/03/31 16:50:35.000861 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:35.000916 [cache msgid 6034] expired: removing
DEBUG: 2015/03/31 16:50:35.000955 [zonedb] Looking for address: [10 20 20 2]
DEBUG: 2015/03/31 16:50:35.000955 [dns msgid 6034] Caching response for PTR-query for "2.20.20.10.in-addr.arpa.": hola.weave.local. [code:NOERROR]
DEBUG: 2015/03/31 16:50:36.077752 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:36.077827 [dns msgid 3401] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:41.086836 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:41.086872 [dns msgid 3401] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:46.089556 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:46.089631 [dns msgid 3401] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:51.099657 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:51.099723 [dns msgid 3401] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:50:56.106288 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 16:50:56.106433 [dns msgid 20265] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 16:50:56.107024 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 16:50:56.107433 [dns msgid 44553] Unsuported query type AAAA
DEBUG: 2015/03/31 16:50:56.107450 [cache msgid 44553] replacing response in cache
DEBUG: 2015/03/31 16:50:56.110639 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:56.110677 [dns msgid 52706] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 16:50:57.708052 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:50:57.708100 [dns msgid 36821] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:51:02.716056 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:51:02.716159 [dns msgid 36821] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 16:51:07.725711 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:51:07.725777 [cache msgid 36821] expired: removing
DEBUG: 2015/03/31 16:51:07.725780 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:51:07.726383 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 16:51:07.726383 [mdns msgid 12673] No local answer for mDNS query 1.42.17.172.in-addr.arpa.
INFO: 2015/03/31 16:51:08.227555 [dns msgid 36821] No results for type PTR query 1.42.17.172.in-addr.arpa. [caching no-local]
DEBUG: 2015/03/31 16:51:08.227618 [dns msgid 36821] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 16:51:08.229053 [dns msgid 36821] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 16:51:08.229084 [dns msgid 36821] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 16:51:08.229920 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 16:51:08.229961 [cache msgid 55512] expired: removing
DEBUG: 2015/03/31 16:51:08.229976 [dns msgid 55512] Caching response for A-query for "hola.weave.local.": 10.20.20.2 [code:NOERROR]
DEBUG: 2015/03/31 16:51:08.230108 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 16:51:08.230184 [dns msgid 8668] Unsuported query type AAAA
DEBUG: 2015/03/31 16:51:08.230192 [cache msgid 8668] replacing response in cache
DEBUG: 2015/03/31 16:51:08.231114 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 16:51:08.231149 [cache msgid 35028] expired: removing
DEBUG: 2015/03/31 16:51:08.231160 [zonedb] Looking for address: [10 20 20 2]
DEBUG: 2015/03/31 16:51:08.231169 [dns msgid 35028] Caching response for PTR-query for "2.20.20.10.in-addr.arpa.": hola.weave.local. [code:NOERROR]
docker@dev-5:~$ 

@errordeveloper
Copy link
Contributor

To me it seems like reverse queries take place during regular lookup...

@errordeveloper
Copy link
Contributor

I can also confirm that repeating the same 5 quicker with the currently published version (b00be09) has no such delay.

docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.52s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.01s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ time nslookup hola.weave.local 172.17.42.1
Server:    172.17.42.1
Address 1: 172.17.42.1

Name:      hola.weave.local
Address 1: 10.20.20.2 hola.weave.local
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@dev-5:~$ docker logs weavedns
INFO: 2015/03/31 17:09:10.972683 [updater] Using Docker API on unix:///var/run/docker.sock: &[Arch=amd64 GitCommit=a8a31ef GoVersion=go1.4.1 KernelVersion=3.18.5-tinycore64 Os=linux Version=1.5.0 ApiVersion=1.17]
INFO: 2015/03/31 17:09:10.972749 Waiting for interface ethwe to come up
INFO: 2015/03/31 17:09:11.973618 Interface ethwe is up
DEBUG: 2015/03/31 17:09:11.973796 [dns] Initializing cache: 8192 entries
DEBUG: 2015/03/31 17:09:11.974965 Starting the signals handler
INFO: 2015/03/31 17:09:11.975031 Using mDNS on ethwe
DEBUG: 2015/03/31 17:09:11.986159 Listening for DNS on :53 (UDP)
DEBUG: 2015/03/31 17:09:11.986679 Listening for DNS on :53 (TCP)
INFO: 2015/03/31 17:09:13.057163 [http] Ignoring name ef4e764a2589.., not in weave.local.
INFO: 2015/03/31 17:09:13.540422 [updater] Container fd440f94ef08c08bb2b4b40425620ed1eda4ddb2f87801a6a12d56ed2f9bbef2 down. Removing records
INFO: 2015/03/31 17:12:14.540876 [http] Adding hola.weave.local. -> 10.20.20.2
DEBUG: 2015/03/31 17:14:23.813466 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:23.813623 [cache msgid 35223] addind in pending state
DEBUG: 2015/03/31 17:14:23.813669 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 17:14:23.814310 [zonedb] Looking for address: [172 17 42 1]
DEBUG: 2015/03/31 17:14:23.814384 [mdns msgid 8971] No local answer for mDNS query 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:24.314802 [cache msgid 35223] replacing response in cache
DEBUG: 2015/03/31 17:14:24.314875 [dns msgid 35223] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
WARNING: 2015/03/31 17:14:24.335533 [dns msgid 35223] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:24.335497 [dns msgid 35223] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:24.336361 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 17:14:24.336750 [cache msgid 22151] addind in pending state
DEBUG: 2015/03/31 17:14:24.336781 [dns msgid 22151] Caching response for A-query for "hola.weave.local.": 10.20.20.2 [code:NOERROR]
DEBUG: 2015/03/31 17:14:24.336786 [cache msgid 22151] replacing response in cache
DEBUG: 2015/03/31 17:14:24.336641 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 17:14:24.336907 [dns msgid 10103] Unsuported query type AAAA
DEBUG: 2015/03/31 17:14:24.337541 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:24.337578 [cache msgid 50763] addind in pending state
DEBUG: 2015/03/31 17:14:24.337585 [zonedb] Looking for address: [10 20 20 2]
DEBUG: 2015/03/31 17:14:24.337593 [dns msgid 50763] Caching response for PTR-query for "2.20.20.10.in-addr.arpa.": hola.weave.local. [code:NOERROR]
DEBUG: 2015/03/31 17:14:24.337596 [cache msgid 50763] replacing response in cache
DEBUG: 2015/03/31 17:14:25.213045 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:25.213104 [dns msgid 47566] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 17:14:25.213119 [dns msgid 47566] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
WARNING: 2015/03/31 17:14:25.218018 [dns msgid 47566] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:25.217984 [dns msgid 47566] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:25.219154 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 17:14:25.219196 [dns msgid 34932] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:25.219547 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 17:14:25.219629 [dns msgid 23578] Unsuported query type AAAA
DEBUG: 2015/03/31 17:14:25.219688 [cache msgid 23578] replacing response in cache
DEBUG: 2015/03/31 17:14:25.220541 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:25.220579 [dns msgid 59588] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:26.532197 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:26.532243 [dns msgid 12790] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 17:14:26.532255 [dns msgid 12790] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 17:14:26.533484 [dns msgid 12790] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 17:14:26.533511 [dns msgid 12790] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:26.534465 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 17:14:26.534528 [dns msgid 30157] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:26.534639 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 17:14:26.534744 [dns msgid 48548] Unsuported query type AAAA
DEBUG: 2015/03/31 17:14:26.534745 [cache msgid 48548] replacing response in cache
DEBUG: 2015/03/31 17:14:26.535946 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:26.535976 [dns msgid 39037] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:27.452020 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:27.452066 [dns msgid 4120] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 17:14:27.452079 [dns msgid 4120] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 17:14:27.453374 [dns msgid 4120] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 17:14:27.453402 [dns msgid 4120] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:27.454322 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 17:14:27.454374 [dns msgid 23351] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:27.454486 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 17:14:27.454600 [dns msgid 43862] Unsuported query type AAAA
DEBUG: 2015/03/31 17:14:27.454610 [cache msgid 43862] replacing response in cache
DEBUG: 2015/03/31 17:14:27.455410 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:27.455444 [dns msgid 41306] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:28.333884 Reverse query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:28.333954 [dns msgid 16386] Cached 'no local replies' - skipping local lookup
DEBUG: 2015/03/31 17:14:28.333967 [dns msgid 16386] Fallback query: {Name:1.42.17.172.in-addr.arpa. Qtype:12 Qclass:1} [UDP, max:512 bytes]
DEBUG: 2015/03/31 17:14:28.335469 [dns msgid 16386] Failure reported by 10.0.2.3 for query 1.42.17.172.in-addr.arpa.
WARNING: 2015/03/31 17:14:28.335526 [dns msgid 16386] Failed lookup for external name 1.42.17.172.in-addr.arpa.
DEBUG: 2015/03/31 17:14:28.336971 Query: {Name:hola.weave.local. Qtype:1 Qclass:1}
DEBUG: 2015/03/31 17:14:28.336996 [dns msgid 53718] Returning reply from cache: NOERROR/1 answers
DEBUG: 2015/03/31 17:14:28.337238 Query: {Name:hola.weave.local. Qtype:28 Qclass:1}
DEBUG: 2015/03/31 17:14:28.337293 [dns msgid 26027] Unsuported query type AAAA
DEBUG: 2015/03/31 17:14:28.337306 [cache msgid 26027] replacing response in cache
DEBUG: 2015/03/31 17:14:28.338028 Reverse query: {Name:2.20.20.10.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/31 17:14:28.338124 [dns msgid 54909] Returning reply from cache: NOERROR/1 answers
docker@dev-5:~$ 

@inercia
Copy link
Contributor Author

inercia commented Mar 31, 2015

I will double check the patch, looking for the delay @errordeveloper noticed...

@inercia
Copy link
Contributor Author

inercia commented Apr 1, 2015

@errordeveloper I've been trying to find that delay but I have not been able to reproduce it. For a name registered in a peer like something.weave.local (with IP:10.0.0.9) and a WeaveDNS server running at 10.0.0.1, I run a command like this:

 $ i=0 ; while [ $i -lt 100 ] ; do dig @10.0.0.1  something.weave.local ; sleep 1 ; i=$(($i+1)) ; done | grep "Query time"
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 2 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 2 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 2 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 3 msec
;; Query time: 3 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 4 msec
;; Query time: 2 msec
;; Query time: 0 msec
;; Query time: 1 msec

and I do not see anything wrong in the logs. Could you repeat the tests with dig instead of nslookup? nslookup uses its own internal DNS client library, and that could lead to unexpected results...

Anyway, I have updated the PR with some cleanups in the code, specially in the cache...

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 33.12% when pulling cf870b7 on inercia:weave-225-fix-1 into e66225c on zettio:master.

@errordeveloper
Copy link
Contributor

Looks like it's more specific to busybox implementation of nslookup, which is quite weird.

@errordeveloper
Copy link
Contributor

I would be inclined to merge this and create a new issue for looking into what's up with nslookup in busybox, which looks like a it's go smaller surface area then the original problem this fixes; @awh @rade what do you think?

@rade
Copy link
Member

rade commented Apr 2, 2015

fine by me

@inercia
Copy link
Contributor Author

inercia commented Apr 2, 2015

I will double check a couple and things anyway, just in case I'm missing something...

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.29%) to 33.08% when pulling 9f14ae5 on inercia:weave-225-fix-1 into a68d444 on zettio:master.

@awh
Copy link
Contributor

awh commented Apr 2, 2015

@inercia what's the plan for the test tool - should it not live in the weave repo, particularly now we're making progress on CI?

@errordeveloper
Copy link
Contributor

@inercia should I test 9f14ae5 agains busybox nslookup?

@inercia
Copy link
Contributor Author

inercia commented Apr 2, 2015

@awh My plan was to integrate this tool in some CI job, but it was developed in a rush and I took a short path: I used Mininet for creating virtual hosts and, even when I was trying to avoid Docker so I could run it in a hosted CI service, I didn't realize that Mininet requires OpenVSwitch and OpenVSwitch can not be installed either. So we could either a) update the tool for launching WeaveDNS instances while controlling their binding ports and so, or b) move to a CI service that allows Docker instances and launch WeaveDNS in a container...

@errordeveloper If you could test the latest version, that would be great. Could you also look for possible patterns in the delays you observed? For example, if they happen every 30secs, they could be related to cache expiration...

@errordeveloper
Copy link
Contributor

@inercia my gut tells me it's just that version of nslookup doing an additional reverse lookup...

@awh
Copy link
Contributor

awh commented Apr 2, 2015

@inercia

There is a lot of change in this commit, the (vast) majority of which does not address #494! After a careful reading I discern the following logical changes:

  • Removal of cache wait functionality - incidentally, many file, function and inline comments still reference it
  • Modification of Cache.Put public API to take TTL, removing CacheNoLocalReplies special case handling in setReply
  • Change to deep copy semantics in Cache.getReply - is this fixing an as yet unreported TTL related bug in the cache? I would at least call this out in an explicit commit, to help @ilya and others if it's encountered in the wild
  • Make forward + reverse lookup authoritative - again, an unreported bug?
  • Move query type check after cache lookup - this is an optimisation unrelated to the defect
  • Short circuit negative local TTL if name is in zone - another unrelated optimisation
  • Ensure negative response cache expiries for forward lookups - this is the actual fix as far as I can tell, and is a one line change to pass the CacheNoLocalReplies flag so that the TTL doesn't default to math.MaxUint32

Apologies if I've misread, it was quite difficult to unpick everything! I would have broken these out as separate commits, each commented as per my bullets above, if for no other reason than reviewing it was extremely difficult; arguably most of it belongs in one or more separate PRs anyway.

Finally (and this is a general comment, to be addressed in the longer term perhaps), I'm not a great fan of passing a nil response + CacheNoLocalReplies into the cache as a way of caching negative responses - how about typing the cache entries to differentiate them?

@inercia inercia changed the title Fix for zettio/weave#494 Code cleanups in cache and fix for negative TTLs Apr 2, 2015
@inercia
Copy link
Contributor Author

inercia commented Apr 2, 2015

@awh Apologies if you found the PR a bit messy. The PR evolved from a simple fix to a general cleanup in the code, trying to remove some parts we do not plan to use in the near future, as well as some internal API improvements. The fix is limited to the fact of using different lookups sets depending on the value stored in the cache.

And I would consider enhancements those unreported bugs you mentioned. For example, it is not imperative to set the authoritative flag in our mDNS responses (in fact, we have never done that), but I think it would more appropriate.

I would rather not add typing to cache entries: I think that would involve introspection for every cache access and (I'm not an expert in Go's implementation but) that would probably have an unnecessary cost. I prefer the nil response (as the response is... hmmm... nil!) and a specific TTL.

Please let me know if you find particular issues you consider not appropriate for this PR, and please assign comments to specific code lines, so they disappear if the line is modified or disappears...

@errordeveloper
Copy link
Contributor

@inercia I have just tested 9f14ae5 and busybox nslookup is happy 👍

@errordeveloper
Copy link
Contributor

Would be great if we can pinpoint what exactly the problem was...

@inercia
Copy link
Contributor Author

inercia commented Apr 8, 2015

Thanks a lot for testing this, @errordeveloper. As you said, it would be nice to know what was exactly the problem with nslookup, but we can live with that... In the (near) future, I will improve our integration tests for DNS, so this kind of bugs should be easier to catch...

Anyway, I will split this PR in several smaller PRs, as @awh and @rade suggested...

@inercia
Copy link
Contributor Author

inercia commented Apr 9, 2015

@awh and @rade, I have splitted the PR in several commits for increased readability. Only the first commit is essential for fixing this bug. Would you want me to move the other commits to a different PR?

@rade
Copy link
Member

rade commented Apr 9, 2015

Would you want me to move the other commits to a different PR?

Yes, a single 2nd PR should do. For now at least.

@inercia
Copy link
Contributor Author

inercia commented Apr 9, 2015

Ok, I'll do that...

@inercia
Copy link
Contributor Author

inercia commented Apr 9, 2015

@rade & @awh, I've left only one commit in the PR. Please let me know what you guys think...

@inercia inercia changed the title Code cleanups in cache and fix for negative TTLs Fix for negative TTLs in WeaveDNS Apr 9, 2015
awh added a commit that referenced this pull request Apr 13, 2015
Fix for negative TTLs in WeaveDNS
@awh awh merged commit 29b3b91 into weaveworks:master Apr 13, 2015
@inercia inercia deleted the weave-225-fix-1 branch April 13, 2015 13:20
@rade rade modified the milestone: 0.10.0 Apr 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dns] negative response cache doesn't expire
6 participants