Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

[dns] constant delay on reverse lookups of non-weave IPs #473

Closed
stonemaster opened this issue Mar 19, 2015 · 9 comments
Closed

[dns] constant delay on reverse lookups of non-weave IPs #473

stonemaster opened this issue Mar 19, 2015 · 9 comments

Comments

@stonemaster
Copy link

Reverse DNS requests to Weave DNS are very slow compared to those made against a local bind server:

Weave DNS

# time host 192.168.2.1 172.17.42.1
Using domain server:
Name: 172.17.42.1
Address: 172.17.42.1#53
Aliases: 

Host 1.2.168.192.in-addr.arpa. not found: 3(NXDOMAIN)

real    0m0.508s
user    0m0.007s
sys     0m0.000s

LOCAL BIND SERVER

# time host 192.168.2.1 172.17.42.2
Using domain server:
Name: 172.17.42.2
Address: 172.17.42.2#53
Aliases: 

Host 1.2.168.192.in-addr.arpa. not found: 3(NXDOMAIN)

real    0m0.006s
user    0m0.006s
sys     0m0.000s

Error message of weavedns during query:

DEBUG: 2015/03/19 12:26:33.692228 Reverse query: {Name:1.2.168.192.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/19 12:26:33.692247 [zonedb] Looking for address: [192 168 2 1]
DEBUG: 2015/03/19 12:26:33.692601 [zonedb] Looking for address: [192 168 2 1]
DEBUG: 2015/03/19 12:26:33.692617 [mdns msgid 356] No local answer for mDNS query 1.2.168.192.in-addr.arpa.
DEBUG: 2015/03/19 12:26:34.192546 [dns msgid 64921] Fallback query: {Name:1.2.168.192.in-addr.arpa. Qtype:12 Qclass:1}
DEBUG: 2015/03/19 12:26:34.192875 [dns msgid 64921] Failure reported by 172.17.42.2 for query 1.2.168.192.in-addr.arpa.
WARNING: 2015/03/19 12:26:34.192888 [dns msgid 64921] Failed lookup for external name 1.2.168.192.in-addr.arpa.

The local bind server is the fallback DNS server for weavedns. It starts to ask that fallback server after waiting 0.5 seconds apparently.

@bboreham
Copy link
Contributor

I believe this is a valid description of the current implementation.
Is it causing you a problem?

@stonemaster
Copy link
Author

Unfortunately yes. In my setup I am running a service in a docker container (connected to weave/weave DNS) which does a reverse DNS request on the destination IP with every request. Because the IP is a local one the reverse dns query will fail every time and adds a constant delay to every request. Disabling the service's behaviour is not an option. Bind answers failing reverse DNS requests without a noticable delay.

@rade rade added the feature label Mar 19, 2015
@rade rade changed the title Reverse DNS requests very slow compared to bind [dns] constant delay on reverse lookups of non-weave IPs Mar 19, 2015
@awh
Copy link
Contributor

awh commented Mar 20, 2015

@stonemaster the delay is caused because weaveDNS does not currently cache negative local query results. Consequently each time you ask it to reverse resolve a non-weave IP we incur a delay while we multicast to other peers looking for an answer; only when this times out do we fallback to external DNS. There is some ongoing work to add caching to weaveDNS which should mitigate this problem in future.

@stonemaster
Copy link
Author

weavedns is configured to resolve 10.0.0.0/16 while the ip address I want to resolve is outside that managed range.
When weavedns is asked to resolve domains or ip addresses it does not actually manage it should fallback to the external DNS immediately without asking other weavedns instances.

@awh
Copy link
Contributor

awh commented Mar 23, 2015

Logically you are completely correct; unfortunately there are some characteristics of the implementation, driven by other important requirements, that make this less straightforward than might be expected:

  • Firstly, at no point is the user required to tell weave what subnet(s) are in use - they are implicit in the weave run <CIDR> and weave attach <CIDR> commands. Furthermore, it's possible to weave attach or weave run with new subnets at any peer at any time after the network is started. This is important for the user experience - we don't want users to have to specify application subnets up front.
  • The weave router and weaveDNS are distributed over a number of hosts. Consequently the knowledge of which subnets are managed by weave is distributed, and for any weaveDNS instance to answer the question 'is this IP address managed by weave' it necessarily has to consult with its peers. It is this consultation (and resulting timeout) that is causing the delay you are experiencing.

There are a couple of possible ways of addressing the problem:

  • Introduce caching of negative replies in the consultation, as mentioned earlier in this thread. This will mean that the delay is incurred infrequently (e.g. the first time an IP is reversed on a given weaveDNS peer, or after the negative reply cache entry TTL expires). We are working on this at the moment.
  • Pro-actively distribute information about new subnets as they appear on the weave network, so that all weaveDNS instances have an up-to-date picture of which IP ranges weave is managing. Getting this right without affecting scalability is complex, and therefore something for the longer term. We have tentative plans to replace the mDNS part of weaveDNS with something more scalable (probably a gossip protocol similar to that used by the weave router) - we can look at this approach as part of that work.

@rade
Copy link
Member

rade commented Mar 23, 2015

@stonemaster

weavedns is configured to resolve 10.0.0.0/16

No, it isn't. The CIDR in weave launch-dns is for the subnet used by weaveDNS only. As the docs say that CIDR is meant to specify "a subnet that is a) common to all weaveDNS containers, b) disjoint from the application subnets, and c) not in use on any of the hosts.". So that CIDR tells us nothing about application container subnets.

@stonemaster
Copy link
Author

@rade Thanks for the hint. You indeed made me aware of configuration problem on my side
@awh Thanks for the detailed explanation! Having a negative cache for reverse lookup would definitely help in my case.

@inercia
Copy link
Contributor

inercia commented Mar 25, 2015

@stonemaster The latest version should mitigate this in some degree. There is a new mechanism that performs negative caching for local responses. So you should experience that delay on the first query but, if it does not belong to the local domain, WeaveDNS will remember that for 30 seconds and go straight to your fallback server...

@rade rade added bug and removed feature labels Apr 2, 2015
@rade
Copy link
Member

rade commented Apr 2, 2015

resolved by #429.

@rade rade closed this as completed Apr 2, 2015
@rade rade added this to the 0.10.0 milestone Apr 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants