[dns] constant delay on reverse lookups of non-weave IPs #473

stonemaster · 2015-03-19T13:08:58Z

Reverse DNS requests to Weave DNS are very slow compared to those made against a local bind server:

Weave DNS

# time host 192.168.2.1 172.17.42.1
Using domain server:
Name: 172.17.42.1
Address: 172.17.42.1#53
Aliases: 

Host 1.2.168.192.in-addr.arpa. not found: 3(NXDOMAIN)

real    0m0.508s
user    0m0.007s
sys     0m0.000s

LOCAL BIND SERVER

# time host 192.168.2.1 172.17.42.2
Using domain server:
Name: 172.17.42.2
Address: 172.17.42.2#53
Aliases: 

Host 1.2.168.192.in-addr.arpa. not found: 3(NXDOMAIN)

real    0m0.006s
user    0m0.006s
sys     0m0.000s

Error message of weavedns during query:

DEBUG: 2015/03/19 12:26:33.692228 Reverse query: {Name:1.2.168.192.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/19 12:26:33.692247 [zonedb] Looking for address: [192 168 2 1]
DEBUG: 2015/03/19 12:26:33.692601 [zonedb] Looking for address: [192 168 2 1]
DEBUG: 2015/03/19 12:26:33.692617 [mdns msgid 356] No local answer for mDNS query 1.2.168.192.in-addr.arpa.
DEBUG: 2015/03/19 12:26:34.192546 [dns msgid 64921] Fallback query: {Name:1.2.168.192.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/19 12:26:34.192875 [dns msgid 64921] Failure reported by 172.17.42.2 for query 1.2.168.192.in-addr.arpa.
WARNING: 2015/03/19 12:26:34.192888 [dns msgid 64921] Failed lookup for external name 1.2.168.192.in-addr.arpa.

The local bind server is the fallback DNS server for weavedns. It starts to ask that fallback server after waiting 0.5 seconds apparently.

The text was updated successfully, but these errors were encountered:

bboreham · 2015-03-19T13:39:56Z

I believe this is a valid description of the current implementation.
Is it causing you a problem?

stonemaster · 2015-03-19T14:54:35Z

Unfortunately yes. In my setup I am running a service in a docker container (connected to weave/weave DNS) which does a reverse DNS request on the destination IP with every request. Because the IP is a local one the reverse dns query will fail every time and adds a constant delay to every request. Disabling the service's behaviour is not an option. Bind answers failing reverse DNS requests without a noticable delay.

awh · 2015-03-20T10:34:34Z

@stonemaster the delay is caused because weaveDNS does not currently cache negative local query results. Consequently each time you ask it to reverse resolve a non-weave IP we incur a delay while we multicast to other peers looking for an answer; only when this times out do we fallback to external DNS. There is some ongoing work to add caching to weaveDNS which should mitigate this problem in future.

stonemaster · 2015-03-23T07:38:55Z

weavedns is configured to resolve 10.0.0.0/16 while the ip address I want to resolve is outside that managed range.
When weavedns is asked to resolve domains or ip addresses it does not actually manage it should fallback to the external DNS immediately without asking other weavedns instances.

awh · 2015-03-23T12:28:57Z

Logically you are completely correct; unfortunately there are some characteristics of the implementation, driven by other important requirements, that make this less straightforward than might be expected:

Firstly, at no point is the user required to tell weave what subnet(s) are in use - they are implicit in the weave run <CIDR> and weave attach <CIDR> commands. Furthermore, it's possible to weave attach or weave run with new subnets at any peer at any time after the network is started. This is important for the user experience - we don't want users to have to specify application subnets up front.
The weave router and weaveDNS are distributed over a number of hosts. Consequently the knowledge of which subnets are managed by weave is distributed, and for any weaveDNS instance to answer the question 'is this IP address managed by weave' it necessarily has to consult with its peers. It is this consultation (and resulting timeout) that is causing the delay you are experiencing.

There are a couple of possible ways of addressing the problem:

Introduce caching of negative replies in the consultation, as mentioned earlier in this thread. This will mean that the delay is incurred infrequently (e.g. the first time an IP is reversed on a given weaveDNS peer, or after the negative reply cache entry TTL expires). We are working on this at the moment.
Pro-actively distribute information about new subnets as they appear on the weave network, so that all weaveDNS instances have an up-to-date picture of which IP ranges weave is managing. Getting this right without affecting scalability is complex, and therefore something for the longer term. We have tentative plans to replace the mDNS part of weaveDNS with something more scalable (probably a gossip protocol similar to that used by the weave router) - we can look at this approach as part of that work.

rade · 2015-03-23T13:33:15Z

@stonemaster

weavedns is configured to resolve 10.0.0.0/16

No, it isn't. The CIDR in weave launch-dns is for the subnet used by weaveDNS only. As the docs say that CIDR is meant to specify "a subnet that is a) common to all weaveDNS containers, b) disjoint from the application subnets, and c) not in use on any of the hosts.". So that CIDR tells us nothing about application container subnets.

stonemaster · 2015-03-23T13:46:13Z

@rade Thanks for the hint. You indeed made me aware of configuration problem on my side
@awh Thanks for the detailed explanation! Having a negative cache for reverse lookup would definitely help in my case.

inercia · 2015-03-25T12:36:31Z

@stonemaster The latest version should mitigate this in some degree. There is a new mechanism that performs negative caching for local responses. So you should experience that delay on the first query but, if it does not belong to the local domain, WeaveDNS will remember that for 30 seconds and go straight to your fallback server...

rade · 2015-04-02T05:32:18Z

resolved by #429.

rade added the feature label Mar 19, 2015

rade changed the title ~~Reverse DNS requests very slow compared to bind~~ [dns] constant delay on reverse lookups of non-weave IPs Mar 19, 2015

rade added bug and removed feature labels Apr 2, 2015

rade closed this as completed Apr 2, 2015

rade added this to the 0.10.0 milestone Apr 18, 2015

awh added the [component/dns] label Jun 2, 2015

awh mentioned this issue Jun 2, 2015

Replace mDNS peering with gossip #826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dns] constant delay on reverse lookups of non-weave IPs #473

[dns] constant delay on reverse lookups of non-weave IPs #473

stonemaster commented Mar 19, 2015

bboreham commented Mar 19, 2015

stonemaster commented Mar 19, 2015

awh commented Mar 20, 2015

stonemaster commented Mar 23, 2015

awh commented Mar 23, 2015

rade commented Mar 23, 2015

stonemaster commented Mar 23, 2015

inercia commented Mar 25, 2015

rade commented Apr 2, 2015

[dns] constant delay on reverse lookups of non-weave IPs #473

[dns] constant delay on reverse lookups of non-weave IPs #473

Comments

stonemaster commented Mar 19, 2015

Weave DNS

LOCAL BIND SERVER

bboreham commented Mar 19, 2015

stonemaster commented Mar 19, 2015

awh commented Mar 20, 2015

stonemaster commented Mar 23, 2015

awh commented Mar 23, 2015

rade commented Mar 23, 2015

stonemaster commented Mar 23, 2015

inercia commented Mar 25, 2015

rade commented Apr 2, 2015