Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave 0.9.0 failing to startup properly on ubuntu 12.04 + Docker 1.5.0 #470

Closed
alex-sherwin opened this issue Mar 18, 2015 · 14 comments
Closed
Assignees
Labels
Milestone

Comments

@alex-sherwin
Copy link

I have a vanilla install of Docker 1.5.0 and Weave 0.9.0 on an Ubuntu 12.04 machine, but weave fails to startup

Output of uname -a:
Linux nj 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Output of ifconfig -a:

docker0   Link encap:Ethernet  HWaddr 56:84:7a:fe:97:99  
          inet addr:172.17.42.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::5484:7aff:fefe:9799/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:154 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10256 (10.2 KB)  TX bytes:398 (398.0 B)

dummy0    Link encap:Ethernet  HWaddr d2:9d:19:68:c2:28  
          BROADCAST NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 90:b1:1c:4f:54:12  
          inet addr:192.168.150.30  Bcast:192.168.150.255  Mask:255.255.255.0
          inet6 addr: fe80::92b1:1cff:fe4f:5412/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1893168224 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2228870280 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:982389139684 (982.3 GB)  TX bytes:969893591200 (969.8 GB)
          Interrupt:16 

eth1      Link encap:Ethernet  HWaddr 90:b1:1c:4f:54:13  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:17 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:47993009 errors:0 dropped:0 overruns:0 frame:0
          TX packets:47993009 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15205837118 (15.2 GB)  TX bytes:15205837118 (15.2 GB)

weave     Link encap:Ethernet  HWaddr 7a:32:99:ee:3f:f7  
          UP BROADCAST MULTICAST  MTU:65535  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Docker startup has not been modified, it's run as the default /usr/bin/docker -d command

Output of sudo sh -x /usr/local/bin/weave launch

sudo sh -x /usr/local/bin/weave launch
+ set -e
+ SCRIPT_VERSION=0.9.0
+ id -u
+ [ 0 = 0 ]
+ [ 1 -gt 0 ]
+ [ 0.9.0 = (unreleased version) ]
+ IMAGE_VERSION=0.9.0
+ IMAGE_VERSION=0.9.0
+ BASE_IMAGE=zettio/weave
+ BASE_DNS_IMAGE=zettio/weavedns
+ BASE_TOOLS_IMAGE=zettio/weavetools
+ IMAGE=zettio/weave:0.9.0
+ DNS_IMAGE=zettio/weavedns:0.9.0
+ TOOLS_IMAGE=zettio/weavetools:0.9.0
+ CONTAINER_NAME=weave
+ DNS_CONTAINER_NAME=weavedns
+ BRIDGE=weave
+ CONTAINER_IFNAME=ethwe
+ MTU=65535
+ PORT=6783
+ HTTP_PORT=6784
+ DNS_HTTP_PORT=6785
+ DOCKER_BRIDGE=docker0
+ PROCFS=/proc
+ COMMAND=launch
+ shift 1
+ uname -s -r
+ sed -n -e s|^\([^ ]*\) \([0-9][0-9]*\)\.\([0-9][0-9]*\).*|\1 \2 \3|p
+ read sys maj min
+ [ Linux != Linux ]
+ [ ( 3 -eq 3 -a 5 -ge 5 ) -o 3 -gt 3 ]
+ command_exists ip
+ command -v ip
+ ip netns list
+ docker -v
+ sed -n -e s|^Docker version \([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\).*|\1|p
+ DOCKER_VERSION=1.5.0
+ [ -z 1.5.0 ]
+ [ 1.5.0 = 1.3.0 ]
+ echo+  1.5.0
cut -d. -f 1
+ DOCKER_VERSION_MAJOR=1
+ echo 1.5.0
+ cut -d. -f 2
+ DOCKER_VERSION_MINOR=5
+ echo 1.5.0
+ cut -d. -f 3
+ DOCKER_VERSION_PATCH=0
+ check_not_running weave zettio/weave
+ docker inspect --format={{.State.Running}} {{.Config.Image}} weave
+ create_bridge
+ [ ! -d /sys/class/net/weave ]
+ [ !  = --without-ethtool ]
+ run_tool host ethtool -K weave tx off
+ TOOL_NET=host
+ TOOL_COMMAND=ethtool
+ shift 2
+ docker run --rm --privileged --net=host zettio/weavetools:0.9.0 /bin/ethtool -K weave tx off
+ ip link set dev weave up
+ cat /sys/class/net/weave/address
+ MACADDR=7a:32:99:ee:3f:f7
+ is_cidr 
+ echo 
+ grep -E ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[0-9]{1,2}$
+ [  = -password ]
+ docker run --privileged -d --name=weave -p 6783:6783/tcp -p 6783:6783/udp -e WEAVE_PASSWORD zettio/weave:0.9.0 -name 7a:32:99:ee:3f:f7 -iface ethwe
+ CONTAINER=b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe
+ with_container_netns b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe launch
+ CONTAINER=b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe
+ docker inspect --format={{.State.Pid}} b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe
+ CONTAINER_PID=27436
+ [ 27436 = 0 ]
+ [ 27436 = <no value> ]
+ NETNS=27436
+ [ ! -d /var/run/netns ]
+ rm -f /var/run/netns/27436
+ ln -s /proc/27436/ns/net /var/run/netns/27436
+ LOCAL_IFNAME=vethwepl27436
+ GUEST_IFNAME=vethwepg27436
+ IP_TMPOUT=/tmp/weave_ip_out_27368
+ IP_TMPERR=/tmp/weave_ip_err_27368
+ rm -f /tmp/weave_ip_out_27368 /tmp/weave_ip_err_27368
+ STATUS=0
+ shift 1
+ launch
+ STATUS=1
+ docker inspect --format={{.State.Pid}} b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe
+ [ 27436 != 27436 ]
+ echo Failure during network configuration for container b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe:
Failure during network configuration for container b3e1d9f9d390024fde0d28f81d85d1cf85b0b25a686d539e0f043564253f46fe:
+ cat /tmp/weave_ip_err_27368
+ ip netns exec 27436 ip link show eth0
+ connect_container_to_bridge
+ readlink /proc/27436/ns/net
+ readlink /proc/27368/ns/net
+ [  =  ]
+ echo Container is running in the host network namespace, and therefore cannot be
Container is running in the host network namespace, and therefore cannot be
+ echo connected to weave. Perhaps the container was started with --net=host.
connected to weave. Perhaps the container was started with --net=host.
+ return 1
+ rm -f /tmp/weave_ip_out_27368 /tmp/weave_ip_err_27368 /var/run/netns/27436
+ return 1

And output of docker logs weave

weave 2015/03/18 20:25:14.495707 Command line options: map[iface:ethwe name:7a:32:99:ee:3f:f7 wait:20]
weave 2015/03/18 20:25:14.495827 Command line peers: []
weave 2015/03/18 20:25:34.502371 Unable to find interface ethwe
@alex-sherwin
Copy link
Author

Output of sudo weave launch followed by sudo ./docker-ns weave ifconfig -a

eth0      Link encap:Ethernet  HWaddr 02:42:ac:11:00:12  
          inet addr:172.17.0.18  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe11:12/64 Scope:Link
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:168 (168.0 B)  TX bytes:168 (168.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

@rade
Copy link
Member

rade commented Mar 18, 2015

So this looks like a kernel issue, specifically, it appears that network namespaces are fake and all just end up pointing to the host network namespace. Our script does check for kernels >= 3.5. The OP says he is running 3.5.0. I wonder whether that is in fact too old. @errordeveloper

@rade rade added the bug label Mar 18, 2015
@rade
Copy link
Member

rade commented Mar 18, 2015

OP reports that upgrading the kernel to 3.13 backport resolved the problem. So now we just need to figure out what the oldest version is that works.

@rade
Copy link
Member

rade commented Mar 29, 2015

@errordeveloper ping

@errordeveloper
Copy link
Contributor

Ok, I have Ubuntu 12.04 with linux-image-3.5.0-54-generic kernel installed. I can see weave launch failing the same way, but I was able to set-up namespaces with ip netns.

@rade
Copy link
Member

rade commented Mar 30, 2015

I can see weave launch failing the same way, but I was able to set-up namespaces with ip netns.

That is consistent with the report from the OP. i.e. the ip netns commands work but end up operating on a single global namespace, at least when it comes to manipulating the named namespaces we create with symlinks.

@rade
Copy link
Member

rade commented Mar 30, 2015

@errordeveloper btw, make sure you are running a recent docker.

@errordeveloper
Copy link
Contributor

...ip netns commands work but end up operating on a single global namespace, at least when it comes to manipulating the named namespaces we create with symlinks.

I am not quite sure what is the case here exactly at this point.

But I am thinking the clue is here:

+ readlink /proc/27436/ns/net
+ readlink /proc/27368/ns/net
+ [  =  ]
+ echo Container is running in the host network namespace

The check for this had been introduced in f3faf83, which comes after I have test for 3.5 compatibility last time.

The way the issue manifests itself is this:

root@vagrant-ubuntu-precise-64:~# ls -la /proc/5078/ns/net
-r-------- 1 root root 0 Mar 30 18:04 /proc/5078/ns/net
root@vagrant-ubuntu-precise-64:~# readlink /proc/5078/ns/net 
root@vagrant-ubuntu-precise-64:~# 

While a quick test on boot2docker (kernel 3.18.5) shows that those files are symlink:

docker@test-1:~$ sudo ls -la /proc/1422/ns/net
lrwxrwxrwx    1 root     root             0 Mar 30 18:15 /proc/1422/ns/net -> net:[4026532188]
docker@test-1:~$ sudo readlink /proc/1422/ns/net
net:[4026532188]

So it's to do with the way that files in /proc/<pid>/ns/ are presented to the userspace.

@errordeveloper btw, make sure you are running a recent docker.

Yes, I got 1.5.0.

@rade
Copy link
Member

rade commented Mar 30, 2015

Ah, so on older kernels a process' network namespace is not a symlink. Interesting. So is there any way on those old kernels to tell whether two processes use the same network namespace?

@errordeveloper
Copy link
Contributor

Well, there are probably some pretty complex ways of determining that by observation... However, what we need to tell is whether it's the host's namespace or not, so we can look at eth0, for example.

# ip link show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:ec:b4:23 brd ff:ff:ff:ff:ff:ff
# docker exec -ti 3fdb6b8ea1d5 ip link show dev eth0
37: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:11:00:08 brd ff:ff:ff:ff:ff:ff

However, host might not have eth0, which is not the end of the day and we just see if container doesn't have it also... etc. Can be done, I suppose, if folks need to run 3.5.

@rade
Copy link
Member

rade commented Mar 30, 2015

We could suppress the check for older kernels. But we'd need to know in which kernel version the symlinks were introduced.

@dpw
Copy link
Contributor

dpw commented Mar 30, 2015

We could suppress the check for older kernels. But we'd need to know in which kernel version the symlinks were introduced.

torvalds/linux@bf056bfa80596
torvalds/linux@98f842e675f96

Which went into the 3.7.0 release.

@rade
Copy link
Member

rade commented Mar 30, 2015

torvalds/linux@98f842e675f96

"A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace." which confirms our suspicion that prior to that change the check we want to perform is hard/impossible.

@dpw
Copy link
Contributor

dpw commented Mar 31, 2015

Which went into the 3.7.0 release.

@errordeveloper points out that I got this wrong - it's actually in 3.8.0.

errordeveloper added a commit to errordeveloper/weave that referenced this issue Mar 31, 2015
rade added a commit that referenced this issue Mar 31, 2015
Check if `/proc/<pid>/ns/net` is a symlink

Fixes #470.
@rade rade modified the milestone: 0.10.0 Apr 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants