Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless: slirp4netns slow initial input to container #4537

Closed
waffshappen opened this issue Nov 19, 2019 · 12 comments · Fixed by #4592
Closed

Rootless: slirp4netns slow initial input to container #4537

waffshappen opened this issue Nov 19, 2019 · 12 comments · Fixed by #4592
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@waffshappen
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Running a rootless container and testing network throughput with iperf3 reveals that while the container can handle outputting perfectly fine trying to load its input gets stuck for ~5s. See attached images.

Preparation:
preparation

Upload becoming stuck for ~5s, then performing fine:
stuck_upload

Reverse way working as intended:
okay_download

Steps to reproduce the issue:

  1. Run an alpine container using rootless podman run -p

  2. Run iperf3 -s in the container on a forwarded port with slirp

  3. Run iperf3 -c on the host and target the forwarded port

Describe the results you received:
Upload only managing 5MB, then getting stuck for 5 seconds, then returning to expected speeds.

Describe the results you expected:
Consistent, instant input speed to the container.

Additional information you deem important (e.g. issue happens only occasionally):
It happens across a range of hardware, and i only tested it with slirp so far. Its one of the things that i am really not sure about where that could come from, if its just happening to iperf, and if it'd be fine for other uses, like grafana/statsd or logstash - or if those would be equally affected. Its been months since i first noticed this and it didnt change since then, so to get to the bottom of this i'm reporting it.

Output of podman version:

podman version
Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.13.1
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.13.1
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 3858599936
  MemTotal: 16712970240
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: x230
  kernel: 5.3.8-300.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 350h 43m 56.67s (Approximately 14.58 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /home/tobias/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7-1.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.7
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /home/tobias/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 1
  RunRoot: /run/user/1000
  VolumePath: /home/tobias/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.2-2.fc31.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Happens on all kinds of machines. Works flawlessly when spawning containers as root.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 19, 2019
@giuseppe
Copy link
Member

giuseppe commented Nov 20, 2019

I think it depends on the MTU value we set for slirp4netns.

If I try without the --mtu 65520, I get:

Accepted connection from 10.0.2.2, port 35712
[  5] local 10.0.2.100 port 8080 connected to 10.0.2.2 port 35714
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   310 MBytes  2.60 Gbits/sec                  
[  5]   1.00-2.00   sec   321 MBytes  2.70 Gbits/sec                  
[  5]   2.00-3.00   sec   320 MBytes  2.69 Gbits/sec                  
[  5]   3.00-4.00   sec   315 MBytes  2.64 Gbits/sec                  
[  5]   4.00-5.00   sec   320 MBytes  2.69 Gbits/sec                  
[  5]   5.00-6.00   sec   323 MBytes  2.71 Gbits/sec                  
[  5]   6.00-7.00   sec   328 MBytes  2.75 Gbits/sec                  
[  5]   7.00-8.00   sec   320 MBytes  2.69 Gbits/sec                  
[  5]   8.00-9.00   sec   326 MBytes  2.74 Gbits/sec                  
[  5]   9.00-10.00  sec   310 MBytes  2.60 Gbits/sec                  
[  5]  10.00-10.00  sec   118 KBytes  2.25 Gbits/sec        

which doesn't show the initial slowness, but then slirp4netns performs significantly worse

@AkihiroSuda FYI

@giuseppe
Copy link
Member

if you'd like to reproduce these results, you can build a custom version of Podman where you skip the cmdArgs = append(cmdArgs, "--mtu", "65520") in libpod/networking_linux.go

@waffshappen
Copy link
Author

if you'd like to reproduce these results, you can build a custom version of Podman where you skip the cmdArgs = append(cmdArgs, "--mtu", "65520") in libpod/networking_linux.go

Would there be a way to balance this for all users instead? Having it basically choke on high load tcp for all users of podman doesnt sound like a good thing to have default. Especially for webservers in containers if they ever get hit by more than a few requests per second.

@AkihiroSuda
Copy link
Collaborator

@rhatdan
Copy link
Member

rhatdan commented Nov 20, 2019

If that fixes the issue, we should add this to the troubleshooting page for podman.

@waffshappen
Copy link
Author

rmem issue?
https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md#bugs

This actually improves the situation, a lot. However the weirdest bug in all of this:
If i connect externally to the machine everything works as expected. Only local connections via 127.0.0.1 are affected. Doesnt help for haproxy > 127.0.0.1:port > container however.

Also while that says i could modify it inside the namespace my alpine example container sees it as read-only filesystem, even as in-container "root", and refuses to lower it for only the container.

How increasing a cache brings down performance for localhost connections is beyond me.

@github-actions
Copy link

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

AkihiroSuda added a commit to AkihiroSuda/libpod that referenced this issue Jan 8, 2020
RootlessKit port forwarder has a lot of advantages over the slirp4netns port forwarder:

* Very high throughput.
  Benchmark result on Travis: socat: 5.2 Gbps, slirp4netns: 8.3 Gbps, RootlessKit: 27.3 Gbps
  (https://travis-ci.org/rootless-containers/rootlesskit/builds/597056377)

* Connections from the host are treated as 127.0.0.1 rather than 10.0.2.2 in the namespace.
  No UDP issue (containers#4586)

* No tcp_rmem issue (containers#4537)

* Probably works with IPv6. Even if not, it is trivial to support IPv6.  (containers#4311)

* Easily extensible for future support of SCTP

* Easily extensible for future support of `lxc-user-nic` SUID network

RootlessKit port forwarder has been already adopted as the default port forwarder by Rootless Docker/Moby,
and no issue has been reported AFAIK.

As the port forwarder is imported as a Go package, no `rootlesskit` binary is required for Podman.

Fix containers#4586
May-fix containers#4559
Fix containers#4537
May-fix containers#4311

See https://github.com/rootless-containers/rootlesskit/blob/v0.7.0/pkg/port/builtin/builtin.go

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@mxhCodes
Copy link

mxhCodes commented May 3, 2020

I experience the same problem with podman version 1.9.1 using Kernel 5.3.0-51-generic (Linux Mint 19.3). The suggested workaround mentioned in https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md#bugs does not work in the sense that root privileges are actually required to change it, also in unshared mode. I thought RootlessKit port forwarder might be in in the current podman version but apparently it doesn't solve the problem.

@AkihiroSuda
Copy link
Collaborator

root privileges are actually required to change it

Not true since kernel 4.15. Note that you need to nsenter the namespaces to change it.

I thought RootlessKit port forwarder might be in in the current podman version but apparently it doesn't solve the problem.

The RootlessKit port forwarder doesn't seem to hit the problem

@mxhCodes
Copy link

mxhCodes commented May 4, 2020

Thanks for your quick answer. I did a mistake in my thinking regards network namespace, was using the namespace of the slirp4netns process, which I couldn't modify without root privileges (and didn't make any sense anyway). In the bug hint, I associated "inside the namespace" with the namespace of slirp4netns itself, which is wrong.

I got it working by hooking into the container processes themselves. For example, I had a php-fpm container talking to a database over TCP, where each connection using slirp4netns was "idle" everytime in the first 5 seconds. I could change the rmem setting of the main php-fpm process (might work for any other pool process too as the namespace should be the same) this way, without root privileges:
$ nsenter -n -U --target=$(ps -C php-fpm -f | grep master | awk '{print $2}') /bin/sh -c 'c=$(cat /proc/sys/net/ipv4/tcp_rmem); echo $c | sed -e s/131072/87380/g > /proc/sys/net/ipv4/tcp_rmem'

@AkihiroSuda
Copy link
Collaborator

In the bug hint, I associated "inside the namespace" with the namespace of slirp4netns itself, which is wrong.

Isn't wrong, and that is same as the container's namespace? 🤔

@mxhCodes
Copy link

mxhCodes commented May 4, 2020

Whenever I try to nsenter -n with target PID of any /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 ... I get nsenter: reassociate to namespace 'ns/net' failed: Operation not permitted
Seems I'm not yet on the right track :D but at least I could change it on the container's PID 1 and got rid of the slow inital input.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants