-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container can't connect to outer IPs (routing issue?) #13153
Comments
Can you share the output of |
Thanks - I guess there was something wrong with slirp4netns. After starting (without reboot) all containers, I rechecked and tap0 was there, outer communication also worked again. I guess this was some kind of user error - I begin to use systemd for some container, maybe there was some conflict/issue earlier. At that time slirp4netns used nearly 1GB or RAM - I stopped my systemd container but I guess something was half wrong. Closing this, as it is solved and not easily (if at all) reproducible. |
Oh, you use systemd user units? If so I know how to reproduce, this is a real bug. |
I use one systemd user unit - for postgres for now. I have had the issue again, so I'll need to do some testing. |
It happens if you start the container via systemd. Podman will launch the slirp4netns process and systemd will keep that process in the unit cgroup. So when when you stop that systemd unit, sytemd will kill all processes in that cgroup. This is wrong since the slirp4netns process should live as long as a any container with networks is running. We have to fix this in podman so that we move the slirp4netns process in a separate cgroup so that systemd does not kill it. |
systemd user unit is stopped and disabled, for extensive testing. After a reboot (and kernel upgrade), the issue is still consistent. 1 of them have port_handler=slirp4netns ("external" reverse proxy - for real IPs) Maybe the port_handler=slirp4netns is the issue? $ podman unshare --rootless-netns ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether f6:99:fd:3c:23:e3 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0
valid_lft forever preferred_lft forever
inet6 fd00::f499:fdff:fe3c:23e3/64 scope global dynamic mngtmpaddr
valid_lft 86340sec preferred_lft 14340sec
inet6 fe80::f499:fdff:fe3c:23e3/64 scope link
valid_lft forever preferred_lft forever
3: cni-podman2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a6:ee:7a:6b:fa:de brd ff:ff:ff:ff:ff:ff
inet6 fe80::3cb9:41ff:feb5:a7d0/64 scope link
valid_lft forever preferred_lft forever
5: cni-podman10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 76:1b:bb:9b:8b:d5 brd ff:ff:ff:ff:ff:ff
inet 10.89.9.1/24 brd 10.89.9.255 scope global cni-podman10
valid_lft forever preferred_lft forever
inet6 fe80::741b:bbff:fe9b:8bd5/64 scope link
valid_lft forever preferred_lft forever
7: cni-podman9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 9a:ae:53:79:3a:d1 brd ff:ff:ff:ff:ff:ff
inet 10.89.8.1/24 brd 10.89.8.255 scope global cni-podman9
valid_lft forever preferred_lft forever
inet6 fe80::98ae:53ff:fe79:3ad1/64 scope link
valid_lft forever preferred_lft forever
8: veth40541b01@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman9 state UP group default
link/ether b6:d2:e8:26:e4:7c brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::b4d2:e8ff:fe26:e47c/64 scope link
valid_lft forever preferred_lft forever
9: cni-podman1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1e:e1:5b:49:2d:4b brd ff:ff:ff:ff:ff:ff
inet6 fe80::c468:2cff:fed9:feed/64 scope link
valid_lft forever preferred_lft forever
10: vethdc44e9a4@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman1 state UP group default
link/ether 1e:e1:5b:49:2d:4b brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::1ce1:5bff:fe49:2d4b/64 scope link
valid_lft forever preferred_lft forever
11: veth501ab2bc@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman10 state UP group default
link/ether 0a:bf:0e:ca:de:df brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::8bf:eff:feca:dedf/64 scope link
valid_lft forever preferred_lft forever
12: cni-podman7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 7a:a7:ac:1d:6b:20 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e815:cdff:fe9e:d41d/64 scope link
valid_lft forever preferred_lft forever
13: vethef3df17b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman7 state UP group default
link/ether 7a:a7:ac:1d:6b:20 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::78a7:acff:fe1d:6b20/64 scope link
valid_lft forever preferred_lft forever
14: cni-podman8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d6:8c:36:7b:48:9a brd ff:ff:ff:ff:ff:ff
inet6 fe80::e4ec:fdff:fe1a:4509/64 scope link
valid_lft forever preferred_lft forever
15: vethdbbc2cc1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman8 state UP group default
link/ether d6:8c:36:7b:48:9a brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::d48c:36ff:fe7b:489a/64 scope link
valid_lft forever preferred_lft forever
16: vethaa740482@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman9 state UP group default
link/ether 56:bf:fd:46:0a:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::54bf:fdff:fe46:aee/64 scope link
valid_lft forever preferred_lft forever
17: cni-podman3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ba:70:17:4c:69:e2 brd ff:ff:ff:ff:ff:ff
inet 10.89.2.1/24 brd 10.89.2.255 scope global cni-podman3
valid_lft forever preferred_lft forever
inet6 fe80::b870:17ff:fe4c:69e2/64 scope link
valid_lft forever preferred_lft forever
18: veth4b62e7cb@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman3 state UP group default
link/ether fe:3c:4a:25:73:35 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::fc3c:4aff:fe25:7335/64 scope link
valid_lft forever preferred_lft forever
19: veth2639f26a@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman2 state UP group default
link/ether a6:ee:7a:6b:fa:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::a4ee:7aff:fe6b:fade/64 scope link
valid_lft forever preferred_lft forever
20: vethfba8341f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni-podman1 state UP group default
link/ether 6a:32:e1:00:d9:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::6832:e1ff:fe00:d9a7/64 scope link
valid_lft forever preferred_lft forever |
It looks like you still have the slirp4netns process running here so you should not have any connection problems.
|
I've found the problem - there should be a workaround I don't completely like. There were indeed two issues here, the one at the beginning - no tap0 device (which is fixed). Now I recontructed my original container: podman create \
--replace \
--name=nextcloud \
\
--net nextcloud-pub:ip=10.89.8.3 \
--add-host=reverse-proxy:10.89.8.2 \
\
--net pg-nextcloud:ip=10.89.0.3 \
--add-host=postgres:10.89.0.2 \
\
docker.io/library/nextcloud:22-apache (I've put the volumes and timezone out of the parameters.) The issue is, pg-nextcloud is internal - so no way to phone out. I would need some gateway parameter (which I miss) to correct that. $ podman inspect pg-nextcloud
[
{
"name": "pg-nextcloud",
"id": "3f9abbd86ad9ad1a7dde214c19946db6861063dfc91754d36349756d2fc42532",
"driver": "bridge",
"network_interface": "cni-podman1",
"created": "2022-02-04T18:39:02.615452359+01:00",
"subnets": [
{
"subnet": "10.89.0.0/24"
}
],
"ipv6_enabled": false,
"internal": true,
"dns_enabled": false,
"ipam_options": {
"driver": "host-local"
}
}
] |
OK I see the other issue, we are adding a default route for the internal network. This is obviously wrong. |
Since a internal network has no connectivity to the outside we should not add a default route. Also make sure to not add the default route more than once for ipv4/ipv6. Ref containers/podman#13153 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When running podman inside systemd user units, it is possible that systemd kills the rootless netns slirp4netns process because it was started in the default unit cgroup. When the unit is stopped all processes in that cgroup are killed. Since the slirp4netns process is run once for all containers it should not be killed. To make sure systemd will not kill the process we move it to the user.slice. Fixes containers#13153 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I would suggest to add a parameter to select one specific network for default-route (e.g. --gateway=nextcloud or --default-route=nextcloud/IP). Maybe another issue would be better for this? Some of my containers will have multiple networks without the internal flag. I haven't run into issues yet, which would require this (migration is ongoing - but halted because of this issue). Another weird thing, my synapse container does not have the routing issue, so I'm a bit irritated of how the issue appears and when routes are added to a container. |
This is a problem with the cni configs, internal networks should not have the default route set, see containers/common#920 Some container might not run into this because the ordering how the networks are setup is not deterministic. I think the linked PR should fix your issue. You can test this by manually deleting the 0.0.0.0/0 route from the internal cni config files in |
Yes - I have deleted the route (of both pg networks) and reloaded the network for both container - nextcloud works again - thanks for the quick fix for that. |
Thanks for those fixes! Just two things you might want to consider, I just mention them. For reproducible issues you might want to add some kind of network priority.
In case one has routing issues or other network issues, but it is not reproducible in a consistent way. |
The reason there is a route for each network is because how it interacts with podman network connect/disconnected. When you call disconnect on the network with the default route it would break network connectivity since the other network does not have a default route set. |
Ah yes, understandable. |
When running podman inside systemd user units, it is possible that systemd kills the rootless netns slirp4netns process because it was started in the default unit cgroup. When the unit is stopped all processes in that cgroup are killed. Since the slirp4netns process is run once for all containers it should not be killed. To make sure systemd will not kill the process we move it to the user.slice. Fixes containers#13153 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When running podman inside systemd user units, it is possible that systemd kills the rootless netns slirp4netns process because it was started in the default unit cgroup. When the unit is stopped all processes in that cgroup are killed. Since the slirp4netns process is run once for all containers it should not be killed. To make sure systemd will not kill the process we move it to the user.slice. Fixes containers#13153 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I have no idea what I did wrong...
Whatever (outer) IP I try to connect with -> Network is unreachable
Steps to reproduce the issue:
podman network create nextcloud-pub
podman create --replace --name=nextcloud --net nextcloud-pub docker.io/library/nextcloud:22-apache
2.1
podman start nextcloud
podman exec -t nextcloud curl 8.8.8.8
Describe the results you received:
Describe the results you expected:
IP should be reachable
Additional information you deem important (e.g. issue happens only occasionally):
container network part:
$ podman inspect nextcloud-pub
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
physical
The text was updated successfully, but these errors were encountered: