Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report] network p0 create failed when reboot #2618

Closed
elvizlai opened this issue Dec 26, 2018 · 11 comments
Closed

[bug report] network p0 create failed when reboot #2618

elvizlai opened this issue Dec 26, 2018 · 11 comments
Assignees
Labels
areas/network kind/bug This is bug report for project priority/P1 this is high priority that all maintainers should stop to handle this issue

Comments

@elvizlai
Copy link

elvizlai commented Dec 26, 2018

Ⅰ. Issue Description

missing pouch p0 network interface

Ⅱ. Describe what happened

Root VPC, some container(not all) not started and because missing pouch p0 net interface.

After reboot, MUST systemctl restart pouch to recreate p0, then pouch start container manually.

If there are any container can start(--restart always), then p0 won't create.

example:

pouch run -td --restart always --net host alpine

I think p0 MUST create before vetheXXXX.

ifconfig

p0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.5.1  netmask 255.255.255.0  broadcast 192.168.5.255
        inet6 fe80::42:c0ff:fea8:501  prefixlen 64  scopeid 0x20<link>
        ether 02:42:c0:a8:05:01  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 516 (516.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethef648d6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::64bf:24ff:fecd:d276  prefixlen 64  scopeid 0x20<link>
        ether 66:bf:24:cd:d2:76  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 1032 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. run script
pouch run -td \
    --restart=always \
    --privileged \
    --sysctl net.core.somaxconn=1024 \
    -v /lib/modules:/lib/modules \
    -e HOST_IP='x.y.z' \
    -e VPNUSER=jack \
    -e VPNPASS="opsAdmin" \
    -p 500:500/udp -p 4500:4500/udp \
    --name=ikev2-vpn \
    sdrzlyz/ikev2:5.7.1
reboot
pouch ps -a

the container is not started as expected.

Ⅴ. Anything else we need to know?

systemctl staus pouch -l

● pouch.service - pouch
   Loaded: loaded (/usr/lib/systemd/system/pouch.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2018-12-26 17:30:04 CST; 25s ago
 Main PID: 2505 (pouchd)
    Tasks: 17
   Memory: 76.7M
   CGroup: /system.slice/pouch.service
           ├─2505 /usr/local/bin/pouchd
           └─2960 containerd --config /var/lib/pouch/containerd/state/pouch-containerd.toml --log-level info

12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.077139796+08:00" level=info msg="Removing stale endpoint 84c27e99 (1f76dc0ce9f8b2dd2d7be0a102e29d0e332228a409aba0f94bceba8c8efdd8a1)"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.089374272+08:00" level=info msg="Fixing inconsistent endpoint_cnt for network bridge. Expected=0, Actual=1"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.108381128+08:00" level=warning msg="recover container 84c27e996704fbfb5bc21c23e600d05380447488073e1a1007dbc48cbf4d380b, got a notfound error, start clean the container's resources"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.135912093+08:00" level=warning msg="There are old containers, don't to initialize network"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.150457189+08:00" level=info msg="handle event: 84c27e996704fbfb5bc21c23e600d05380447488073e1a1007dbc48cbf4d380b exit"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.177390154+08:00" level=warning msg="Failed to delete host side interface (vethfe38454)'s link" error="no such device"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.180422683+08:00" level=error msg="failed to create endpoint: failed to create endpoint 84c27e99 on network bridge: adding interface vethfe38454 to bridge p0 failed: could not find bridge p0: route ip+net: no such network interface"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.19611468+08:00" level=error msg="failed to handle event: 84c27e996704fbfb5bc21c23e600d05380447488073e1a1007dbc48cbf4d380b exit"
12月 26 17:30:04 host.localdomain pouchd[2505]: time="2018-12-26T17:30:04.265322188+08:00" level=info msg="start to listen to: unix:///var/run/pouchd.sock"
12月 26 17:30:04 host.localdomain systemd[1]: Started pouch.

Ⅵ. Environment:

  • pouch version (use pouch version):
    latest
  • OS (e.g. from /etc/os-release):
    centos7
  • Kernel (e.g. uname -a):
    4.20
  • Install tools:
  • Others:
@allencloud allencloud added the kind/bug This is bug report for project label Dec 26, 2018
@allencloud
Copy link
Collaborator

Thanks a lot for your feedback.
Could you attach the error or failure message in the issue description? @elvizlai

@elvizlai
Copy link
Author

@allencloud I update the issue with log appended.

@elvizlai
Copy link
Author

journalctl

12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg=serving... address="/run/containerd/debug.sock" module="containerd/debug"
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg=serving... address="/var/run/containerd.sock" module="containerd/grpc"
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08+08:00" level=info msg="containerd successfully booted in 0.012541s" module=containerd
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08.763391573+08:00" level=info msg="success to start containerd" containerd-pid=3333 module=ctrd-supervisord
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08.768286594+08:00" level=info msg="success to create 5 containerd clients, connect to: /var/run/containerd.sock"
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08.76905849+08:00" level=info msg="Snapshotter is set to be overlayfs"
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08.769276734+08:00" level=info msg="invoke pre-start hook in plugin"
12月 26 17:34:08 host.localdomain pouchd[3326]: time="2018-12-26T17:34:08.854821156+08:00" level=warning msg="could not create bridge network for id 462d39135a6c114e13119f5874995dfc1e6cd505fd6abaee4e597c510c67fc51 bridge name p
12月 26 17:34:09 host.localdomain pouchd[3326]: time="2018-12-26T17:34:09.144878279+08:00" level=error msg="getEndpointFromStore for eid 1f76dc0ce9f8b2dd2d7be0a102e29d0e332228a409aba0f94bceba8c8efdd8a1 failed while trying to bu
12月 26 17:34:09 host.localdomain pouchd[3326]: time="2018-12-26T17:34:09.144940644+08:00" level=info msg="Removing stale sandbox 8e6085e6c56397fc030250618e0790b149047d61620b177738b9d6a7fbd33eac (84c27e996704fbfb5bc21c23e600d05
12月 26 17:34:09 host.localdomain pouchd[3326]: time="2018-12-26T17:34:09.145171058+08:00" level=warning msg="Failed deleting endpoint 1f76dc0ce9f8b2dd2d7be0a102e29d0e332228a409aba0f94bceba8c8efdd8a1: failed to get endpoint fro
12月 26 17:34:09 host.localdomain pouchd[3326]: "
12月 26 17:34:09 host.localdomain kernel: IPv6: ADDRCONF(NETDEV_UP): p0: link is not ready
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.1835] manager: (p0): new Bridge device (/org/freedesktop/NetworkManager/Devices/5)
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2494] device (p0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2558] ifcfg-rh: add connection in-memory (6e4554af-2497-4a60-b54c-32841523857e,"p0")
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2579] device (p0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2590] device (p0): Activation: starting connection 'p0' (6e4554af-2497-4a60-b54c-32841523857e)
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2613] device (p0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2618] device (p0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2621] device (p0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2652] device (p0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2660] device (p0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2663] device (p0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
12月 26 17:34:09 host.localdomain NetworkManager[2536]: <info>  [1545816849.2732] device (p0): Activation: successful, device activated.
12月 26 17:34:09 host.localdomain dbus[2512]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
12月 26 17:34:09 host.localdomain systemd[1]: Starting Network Manager Script Dispatcher Service...
12月 26 17:34:09 host.localdomain pouchd[3326]: time="2018-12-26T17:34:09.301777137+08:00" level=info msg="start to listen to: unix:///var/run/pouchd.sock"
12月 26 17:34:09 host.localdomain polkitd[2539]: Unregistered Authentication Agent for unix-process:3320:25319 (system bus name :1.21, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnect
12月 26 17:34:09 host.localdomain systemd[1]: Started pouch.
12月 26 17:34:09 host.localdomain dbus[2512]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
12月 26 17:34:09 host.localdomain systemd[1]: Started Network Manager Script Dispatcher Service.
12月 26 17:34:09 host.localdomain nm-dispatcher[3433]: req:1 'up' [p0]: new request (3 scripts)
12月 26 17:34:09 host.localdomain nm-dispatcher[3433]: req:1 'up' [p0]: start running ordered scripts...

@elvizlai elvizlai changed the title ikev2 network failed when restart network p0 create failed when reboot Dec 26, 2018
@rudyfly
Copy link
Collaborator

rudyfly commented Dec 27, 2018

@elvizlai Can you provide all the network information, ifconfig

@fuweid fuweid changed the title network p0 create failed when reboot [bug report] network p0 create failed when reboot Dec 27, 2018
@elvizlai
Copy link
Author

@rudyfly First time init, the ifconfig result(hidden inet with XXX)

when reboot, the p0 and vetha49ec6b(created by pouch run) is gone.

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 67.XXX.XXX.XXX  netmask 255.255.240.0  broadcast 67.230.191.255
        inet6 fe80::a8aa:ff:fe12:9bdc  prefixlen 64  scopeid 0x20<link>
        ether aa:aa:00:12:9b:dc  txqueuelen 1000  (Ethernet)
        RX packets 97729  bytes 101161122 (96.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 110256  bytes 58737804 (56.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 64  bytes 5184 (5.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 64  bytes 5184 (5.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.5.1  netmask 255.255.255.0  broadcast 192.168.5.255
        inet6 fe80::42:c0ff:fea8:501  prefixlen 64  scopeid 0x20<link>
        ether 02:42:c0:a8:05:01  txqueuelen 1000  (Ethernet)
        RX packets 97060  bytes 55013917 (52.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 79219  bytes 55202494 (52.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vetha49ec6b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::f4a2:ccff:fecf:d063  prefixlen 64  scopeid 0x20<link>
        ether f6:a2:cc:cf:d0:63  txqueuelen 0  (Ethernet)
        RX packets 97060  bytes 56372757 (53.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 79240  bytes 55203964 (52.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

@rudyfly
Copy link
Collaborator

rudyfly commented Jan 15, 2019

set container restart=always, it will start when daemon recover, while it will set into activeSandbox and cause network can't be initialized, so bridge p0 can't be set. Without bridge p0, the container network can't be set, so cause the problem.

@allencloud
Copy link
Collaborator

set container restart=always, it will start when daemon recover, while it will set into activeSandbox and cause network can't be initialized, so bridge p0 can't be set. Without bridge p0, the container network can't be set, so cause the problem.

Do we have any solutions? @rudyfly
And can we cover the fix in the next release of PouchContainer. @fuweid

@fengzixu
Copy link
Contributor

I faced the same problem

@fengzixu
Copy link
Contributor

@rudyfly

@allencloud allencloud added priority/P1 this is high priority that all maintainers should stop to handle this issue and removed kind/bug This is bug report for project labels Mar 14, 2019
@pouchrobot
Copy link
Collaborator

Thanks for your report, @elvizlai
😱 This is a priority/P1 issue which is highest.
Seems to be severe enough.
ping @alibaba/pouch , PTAL.

@allencloud allencloud added the kind/bug This is bug report for project label Mar 14, 2019
@huangjc7
Copy link

huangjc7 commented Apr 21, 2019

问题描述:

[root@csv-slave13 ~]# pouch run -d -p 8099:80 dockerhub.io/hjc-image-nginx:v1.0
Error: failed to run container f1d418: {"message":"failed to create endpoint f1d41862 on network bridge: adding interface veth99f8b71 to bridge p0 failed: could not find bridge p0: route ip+net: no such network interface"}

操作如下:

pouch network create -n pouchnet -d bridge --gateway 192.168.1.1 --subnet 192.168.1.0/24
测试完毕后
pouch network remove pouchnet

之后在命令如问题描述所示

[root@csv-slave13 ~]# pouch run -d -p 8099:80 dockerhub.io/hjc-image-nginx:v1.0
Error: failed to run container f1d418: {"message":"failed to create endpoint f1d41862 on network bridge: adding interface veth99f8b71 to bridge p0 failed: could not find bridge p0: route ip+net: no such network interface"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
areas/network kind/bug This is bug report for project priority/P1 this is high priority that all maintainers should stop to handle this issue
Projects
None yet
Development

No branches or pull requests

6 participants