Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

WeaveDNS problems and suggestions #484

Closed
richtera opened this issue Mar 25, 2015 · 48 comments
Closed

WeaveDNS problems and suggestions #484

richtera opened this issue Mar 25, 2015 · 48 comments
Milestone

Comments

@richtera
Copy link

I was able to get a weave network setup, but I would like to use it within a higher level mesos and marathon abstraction. In order to have the weavedns hooked into the system I set the docker -d options to include --dns . This works good for .weave.local lookups, but has a problem generating a loop within weavedns since it sees it's self when the weavedns container launches. When weavedns launches it should probably ignore seeing it's own IP number as a lookup dns in resolve.conf. And it should probably be able to take additional --dns command line arguments so we can point it to external dns lookups. I was thinking that if docker is setup to automatically start new dockers with --dns then weave launch-dns --dns --dns should setup weavedns to forward things outside of the local dns to those remote dns servers.
I like the weave command line, but ultimately there should be a better integration with docker or mesos to make weave and weavedns automatically register things in weavedns. Using mesos-dns is not quite possible, since mesos-dns will register the hostmachine:haproxyport which is redirected to the instance:port. Since in this scenarios we want to register instance:port in DNS. Ideally weavedns should also support SRV records for all the ports each docker publishes.

@errordeveloper
Copy link
Contributor

@richtera first of all, thank you very much for your feedback!

@errordeveloper
Copy link
Contributor

So the Mesos/Marathon integration you are referring to has not been tested with WeaveDNS just yet, I'm not surprised it didn't quite work out of the box. I am going to take a look into this and will let you know if there is any simple way to fix it.

@inercia
Copy link
Contributor

inercia commented Mar 25, 2015

@richtera There is the new -fallback argument in WeaveDNS (available in the repo since yesterday) that maybe could help you. Launching WeaveDNS with something like -fallback 8.8.8.8:53 will ignore the resolv.conf values and just forward queries to 8.8.8.8:53... Let me know if this works for you...

@errordeveloper
Copy link
Contributor

@richtera could you provide output of weave version?

@errordeveloper
Copy link
Contributor

@inercia that could certainly help, although @richtera will have to build from source...

@richtera
Copy link
Author

I am currently using 0.9
weave script 0.9.0
weave router 0.9.0
weave DNS 0.9.0
weave tools 0.9.0
I think the -fallback argument will solve my current dilemma. Let me build from source.

@richtera
Copy link
Author

PLEASE IGNORE. This was due to an unrelated network issue.
Hmmm getting this error when building the weaveexec docker. NEVER MIND. It was using weave during the build of weave. :-(.

fetch http://dl-4.alpinelinux.org/alpine/v3.1/main/x86_64/APKINDEX.tar.gz
ERROR: http://dl-4.alpinelinux.org/alpine/v3.1/main: IO ERROR
WARNING: Ignoring APKINDEX.689bb31a.tar.gz: No such file or directory
ERROR: unsatisfiable constraints:
  ethtool (missing):
    required by: world[ethtool]
  conntrack-tools (missing):
    required by: world[conntrack-tools]
  curl (missing):
    required by: world[curl]
  iptables (missing):
    required by: world[iptables]
  iproute2 (missing):
    required by: world[iproute2]

@rade
Copy link
Member

rade commented Mar 25, 2015

I have just published new images to the dockerhub, so you can get the latest version by grabbing the latest weave script and running weave setup.

Note however that --fallback may well go away and be replaced with a more general mechanism.

@richtera
Copy link
Author

PARTIALLY RESOLVED: This is because the weave-dns docker doesn't have access to the DNS in the hosting environment's private 10.2 network. Seems weird, but using 8.8.8.8 as the DNS does route correctly.

Seems close but not quite working.
Started the DNS using

sudo DOCKER_BRIDGE=weave weave launch-dns 172.255.254.1/24 -fallback 10.2.0.112:53 -fallback 10.2.0.133:53

The two IPs to DNS are working:

dig @10.2.0.112 www.microsoft.com
...
;; ANSWER SECTION:
www.microsoft.com.  3600    IN  CNAME   toggle.www.ms.akadns.net.
...

But dig into the new DNS doesn't seem to fallback correctly.

dig @172.255.254.1 something.weave.local

versus

dig @172.255.254.1 www.microsoft.com
...
; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> @172.255.254.1 www.microsoft.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 111
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.microsoft.com.     IN  A

;; Query time: 2001 msec
;; SERVER: 172.255.254.1#53(172.255.254.1)
;; WHEN: Wed Mar 25 15:09:45 UTC 2015
;; MSG SIZE  rcvd: 35

The weavedns's log just says

WARNING: 2015/03/25 15:09:45.191419 [dns msgid 111] Failed lookup for external name www.microsoft.com.

Unfortunately the logs of weavedns don't report what kinds of fallbacks were sent to it to prove it got them. But when I change the command line to say --forward (which is obviously wrong) then it will log it doesn't understand forward. So it seems to be getting the arguments and understanding them.
But the weavedns /etc/resolv.conf is already:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.2.0.112
nameserver 10.2.0.133
search bridgeway-eng.com

In my case, so I have not even forced all dockers to use weave-dns.

@inercia
Copy link
Contributor

inercia commented Mar 25, 2015

@richtera I'm not sure, but maybe it could have something to do with the way you use the -fallback argument. First, it can be provided only once. And I think you have to use it with = and quotes, like -fallback="8.8.8.8:53". Please try again this way.

If it fails, could you try again with a safe fallback server, like Google's? Something like:

sudo DOCKER_BRIDGE=weave weave launch-dns 172.255.254.1/24 -fallback="8.8.8.8:53"

As @rade said, this will not be the prefered solution for this problem, but in the meantime...

@awh
Copy link
Contributor

awh commented Mar 25, 2015

I've raised #487 to capture the more general mechanism suggested by @rade.

@richtera
Copy link
Author

Changing the fallback to use = doesn't change anything. I am also pretty sure it's parsing things because initially I tried "10.2.0.112:53,10.2.0.133:53" and it complained about the IP:PORT not being parsed correctly.

 Could not parse fallback host and porttoo many colons in address 10.2.0.112:53,10.2.0.133:53

So it's definitely parsing the string I pass with and without the =. Well since /etc/resolv.conf could in theory contain the fallback DNS's it could just use if from there, but --fallback would be nice if the docker setup is configured to automatically point to the weave-dns on each machine. This would cause docker run inside of weave to launch weave-dns point to itself.

Tried google.s 8.8.8.8:53 as well with the same problems.

@richtera
Copy link
Author

Something is not right though. I do see...

Error.Fatal("Could not parse fallback host and port", err)

when I put in garbage to the fallback option, but I don't see

Debug.Printf("DNS fallback at %s:%s", fallbackHost, fallbackPort)

in the log? And I don't get any other errors. How do I manually build the weavedns docker and point to it?

@richtera
Copy link
Author

The 8.8.8.8 does the trick. After a whole bunch of trying I figured out how to enable debugging which told me on one of my machines nothing was working and all DNS requests were timing out using fallback.
Switched to another machine and tried both IPs and 8.8.8.8 works. It's now clear to me that the 10.2 IP would only work if the weave-dns container didn't do special routing to include the host network which is most likely the case since it's in a address range by itself.

@richtera
Copy link
Author

Startup with nodes configured with weave and weavedns is also going to be hard.
If you configure docker to use the weave network it won't startup until the bridge it created.
The new version of weave requires the docker to run when executing create-bridge.
Chicken and Egg I guess...

@rade
Copy link
Member

rade commented Mar 25, 2015

The new version of weave requires the docker to run when executing create-bridge.
Chicken and Egg I guess...

use --local. That's what it is there for.

@richtera
Copy link
Author

YES! That works.
Now the remaining piece...
Got a weave network with

sudo weave status
weave router git-066d8001dd6d
Encryption off
Our name is xx:xx:xx:xx:xx:xx (node2.bridgeway-eng.com)
Sniffing traffic on &{9 65535 ethwe xx:xx:xx:xx:xx:xx up|broadcast|multicast}
MACs:
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:00:17.257319035 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:56:43.089386543 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:58:46.050300206 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:58:55.614080792 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:56:43.153416815 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:57:07.501066164 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:58:55.826010178 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:00:16.721380261 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:00:16.817391515 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:00:40.320792022 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:00:40.540740238 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 18:58:55.960112805 +0000 UTC)
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx (2015-03-25 19:02:59.176899684 +0000 UTC)
Peers:
Peer xx:xx:xx:xx:xx:xx (node5.x.com) (v4) (UID 18219167174269992547)
   -> xx:xx:xx:xx:xx:xx (node2.x.com) [x.x.0.34:6783]
   -> xx:xx:xx:xx:xx:xx (node4.x.com) [x.x.0.137:6783]
Peer xx:xx:xx:xx:xx:xx (node2.x.com) (v13) (UID 259745694776927968)
   -> xx:xx:xx:xx:xx:xx (node4.x.com) [x.x.0.137:33151]
   -> xx:xx:xx:xx:xx:xx (node5.x.com) [x.x.0.141:35447]
Peer xx:xx:xx:xx:xx:xx (node4.x.com) (v9) (UID 17826224402228508180)
   -> xx:xx:xx:xx:xx:xx (node2.x.com) [x.x.0.34:6783]
   -> xx:xx:xx:xx:xx:xx (node5.x.com) [172.17.4.1:35696]
Routes:
unicast:
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx
broadcast:
xx:xx:xx:xx:xx:xx -> [xx:xx:xx:xx:xx:xx xx:xx:xx:xx:xx:xx]
Reconnects:

weave DNS git-066d8001dd6d
Local domain weave.local.
Listen address :53
mDNS interface &{13 65535 ethwe xx:xx:xx:xx:xx:xx up|broadcast|multicast}
Fallback DNS config &{[8.8.8.8] [] 53 0 0 0}
Zone database:

And.

sudo weave run 172.17.5.1/16 -i -t -h sample.weave.local ubuntu bash

leads to

INFO: 2015/03/25 19:07:09.972966 [updater] Container f5d495b9a42c68063c6364e7a79ae9bc69f9f30398249266d8091150940b77c5 down. Removing records
INFO: 2015/03/25 19:07:12.943574 [http] Adding sample.weave.local. -> 172.17.5.1
INFO: 2015/03/25 19:07:12.968938 [updater] Container ef54d0a5d89598f1893437d8c7d03516446dc1c70a442f2402405526ecb1b5b3 down. Removing records
9ce36d4e8a625c5e9cf98ac59e4af7cea02f0d6471899547bd31c5e637913508

and

.
.
.
xx:xx:xx:xx:xx:xx -> xx:xx:xx:xx:xx:xx
broadcast:
xx:xx:xx:xx:xx:xx -> []
xx:xx:xx:xx:xx:xx -> []
Reconnects:

weave DNS git-066d8001dd6d
Local domain weave.local.
Listen address :53
mDNS interface &{71 65535 ethwe xx:xx:xx:xx:xx:xx up|broadcast|multicast}
Fallback DNS config &{[8.8.8.8] [] 53 0 0 0}
Zone database:
9ce36d4e8a62 172.17.5.1 sample.weave.local.

But unfortunately this wasn't EASY :-).
But it's all working! Thanks for the help.

@richtera
Copy link
Author

Well the mDNS muticast is causing me problems though. I never see the records being forwarded to another machine. It's absolutely a problem on my side except I am not quite sure how to setup routing of the fake DNS partition of the network.

@rade
Copy link
Member

rade commented Mar 25, 2015

you shouldn't have to do any network configuration to get weaveDNS to work, since it communicates over the weave network.

Make sure you have adhered to the instructions for weave launch-dns, in particular the requirement to give each weaveDNS instance "its own, unique, IP address, in a subnet that is a) common to all weaveDNS containers, b) disjoint from the application subnets, and c) not in use on any of the hosts."

If you still can't get things to work, check the logs of the local and remote weaveDNS containers (docker logs weavedns) when you perform a lookup.

@richtera
Copy link
Author

My setup is as follows:
For Machine n = [0:4]

weave --local expose 172.17.n.1/16
weave --local create-bridge
weave launch [n=0 internal IP if n > 0]
weave launch-dns 172.17.254.n/24 -fallback 8.8.8.8:53
docker -d --bridge=weave --fixed-cidr=172.17.n.0/24

@richtera
Copy link
Author

Suddenly started working. I didn't change anything but it's now working.

@richtera
Copy link
Author

The connection between the dns servers seems to come and go? I did place each DNS server into a 254 subnet and for a while they can reach each other and the machines can reach the DNS servers on the other machines. But then after while they no longer communicate; I think it's a routing problem where after a while it forgets the route it initially used.

@rade
Copy link
Member

rade commented Mar 26, 2015

Hmm. Could be an ARP issue. How did you observe the (lack of) communication between weaveDNS servers?

@richtera
Copy link
Author

Initially I can do dns request using dig from any machine to any of the dns servers running on each of the hosts. Then after a while I canno longer do that.

@richtera
Copy link
Author

Ok all this happens because of /etc/default/docker containing

DOCKER_OPTS="--bridge=weave --fixed-cidr=172.17.n.0/24 --dns 172.17.254.n"

for each host n. Without these the dns works. I think there is an issue when weavedns runs within a container which is already in the cidr network and it adds a second interface into the same network.
Removing the setting makes mDNS work but then the containers won't be in the right network by default.
Is there a way to have "weave run" automatically choose the next available IP number?

@richtera richtera reopened this Mar 26, 2015
@bboreham
Copy link
Contributor

We are working on automatic allocation of IPs across the whole network, which is issue #22.

@richtera
Copy link
Author

What's interesting is that using --fixed-cidr will cause docker to pick an available IP. Couldn't this be leveraged? Since each host machine could be given a different --fixed-cidr all instances would automatically get a unique IP. The only workaround we'd need is to make sure weavedns can live in an environment that has docker using --fixed-cidr setup and have the weave command detect it.

@bboreham
Copy link
Contributor

The --fixed-cidr technique requires that the administrator pre-allocate ranges of IPs to each host, and we wanted to make it easier to use and more flexible than that.

Also we would want to be able to allocate an IP for WeaveDNS in one subnet and IPs for other containers in another subnet, on the same machine.

@rade
Copy link
Member

rade commented Mar 26, 2015

also, the cidr determines the subnet. containers can only communicate when on the same subnet. Which conflicts with using the cidr to carve out ranges per docker host.

@richtera
Copy link
Author

Makes sense although making fixed-cidr work as an immediate feature might allow people to jump in an build some clouds :). But I agree a dhcp type solution would be better and it could auto populate the dns as well. I am just kind of stuck; I am trying to deploy a proof of concept application group that includes cassandra, elasticsearch and datomic which all use a gossip kind of approach making them very unhappy with a port redirecting haproxy solution.
@rade I am using a /16 subnet to expose weave and then /24 for each docker cidr. This allows all the containers to communicate correctly over weave. The only problem I am having is that weavedns doesn't work in this scenario because it ends up with two interfaces into the cidr network.

@rade
Copy link
Member

rade commented Mar 26, 2015

The only problem I am having is that weavedns doesn't work in this scenario because it ends up with two interfaces into the cidr network.

Surely all weave application containers will end up with two interfaces. with overlapping CIDRs. That is bad.

@richtera
Copy link
Author

Only if you start them with weave run. If you start them with docker run they will end up in the weave network with a unique IP. At least that's what I was seeing.

@rade
Copy link
Member

rade commented Mar 26, 2015

Ah, I see. So presumably you also tell docker to use the weave bridge.

@rade
Copy link
Member

rade commented Mar 26, 2015

but then I don't understand how you get the weave containers to have a /16 CIDR.

@rade
Copy link
Member

rade commented Mar 26, 2015

ah, you don't. you rely on the default route.

This is all well uncharted territory.

@richtera
Copy link
Author

Yea the config is a little further up in the thread
For each cloud host n I do:

weave --local expose 172.17.n.1/16
weave --local create-bridge

Edit /etc/default/docker
Add

DOCKER_OPTS="--bridge=weave --fixed-cidr=172.17.n.0/24 --dns 172.17.254.n"

Then execute

sudo service docker restart
weave launch [n=0 internal IP if n > 0]
weave launch-dns 172.17.254.n/24 -fallback 8.8.8.8:53

So the only thing that's giving me a hard time here is the weavedns.
It's listening on 172.17.254.n and some random 172.17.n.x because it's running inside of a docker which already adds 172.17.n.x to the instance.
At least this is how I understand the problem; I might be generating other problems by doing this.

@richtera
Copy link
Author

With the new version of docker running under ubuntu I am seeing dnsmasq also taking up port 53 in some places. I haven't narrowed down whether this is another variable now causing additional problems, but just FYI.

@rade
Copy link
Member

rade commented Apr 2, 2015

Is there anything left to investigate here that isn't covered by other issues?

@richtera
Copy link
Author

richtera commented Apr 2, 2015

Is the port 53 thing covered anywhere? This might be a new docker feature which conflicts with the weaveDNS listener.

@rade
Copy link
Member

rade commented Apr 2, 2015

The port won't conflict. weaveDNS runs in a container, not on the host.

@richtera
Copy link
Author

richtera commented Apr 2, 2015

But that's where dnsmasq listens. I think the latest 1.6 docker has some kind of dns forwarder instead of updating /etc/resolv.conf as I understand.

@rade
Copy link
Member

rade commented Apr 2, 2015

Weavedns listens on port 53 of the weavedns container IP. It does not listen on port 53 of the host.

@rade
Copy link
Member

rade commented Apr 2, 2015

btw, any pointers to info about this looming docker 1.6 dnsmasq introduction?

@richtera
Copy link
Author

richtera commented Apr 2, 2015

I just pulled rc1 from docker.io and it prevented me from running the weaveDNS on port 53 so I ran it on 52. Not sure if this is something as part of an updated lxc release or docker itself. I just saw a process called dnsmaq listening on port 53. So I don't have a lot of extra info, since I was tied up with other stuff for the last few days.

@rade
Copy link
Member

rade commented Apr 2, 2015

I've just tried docker 1.6rc3 and encountered no problems.

@richtera
Copy link
Author

richtera commented Apr 2, 2015

Ok, then I'll just close this. Maybe it's part of my setup but I am just using plain ubuntu boxes.

@richtera richtera closed this as completed Apr 2, 2015
@rade
Copy link
Member

rade commented Apr 2, 2015

I do have a dnsmasq running, which is part of the stock ubuntu setup. It doesn't conflict with weavedns for the reasons I explained above. I wonder whether in your weave experiments you somehow managed to launch the weavedns container in the host network namespace, or attempted to publish the container's port 53.

@richtera
Copy link
Author

richtera commented Apr 2, 2015

I did have marathon and mesos running which I suppose could have detected weaveDNS and decided to establish an haproxy entry, but normally only mesos tasks are registered inside of haproxy. Other than that I didn't specify arguments when launching weavedns to specify networking options. When I changed the listen port to 52 it worked normally.

@rade rade added this to the 0.10.0 milestone Apr 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants