Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to set RLIMIT_NPROC in container inside container #24508

Closed
duck-rh opened this issue Nov 8, 2024 · 8 comments · Fixed by #24547
Closed

Fails to set RLIMIT_NPROC in container inside container #24508

duck-rh opened this issue Nov 8, 2024 · 8 comments · Fixed by #24547
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@duck-rh
Copy link

duck-rh commented Nov 8, 2024

Issue Description

Quack,

We use a podman container (registry.gitlab.com/osci/podman-systemd-container) run by gitlab CI to run Ansible molecule tests, which in turn starts containers for the tests. Without any change to the role the test not fails. Last known working version was 5.3.0-dev-2aacd4e21 and when the daily test ran the next day with 5.3.0-dev-290d94d3c it stopped working.

Steps to reproduce the issue

I do not have a simple reproducer, but we run the official podman image (registry.gitlab.com/osci/podman-systemd-container), which in turn starts a container in the OS and version we wish to do the test, which is often CentOS Stream 9 (quay.io/centos/centos:stream9).

The outer container is run as a non-root user using the following options:
--sysctl=net.ipv4.ip_forward=1
--sysctl=net.ipv4.ping_group_range=0 0
--sysctl=net.ipv6.conf.all.forwarding=1
--device=/dev/fuse
--privileged
--security-opt=label=disable
--security-opt=seccomp=unconfined
--systemd=always
--uts=private
--env=container=podman

Describe the results you received

Error: crun: setrlimit RLIMIT_NOFILE: Operation not permitted: OCI permission denied\n", "stderr_lines": ["Error: crun: setrlimit RLIMIT_NOFILE: Operation not permitted: OCI permission denied

See first time it stopped working: https://gitlab.com/osci/ansible-role-postgrey/-/jobs/8136363743

Describe the results you expected

The container should start properly.

See last working run: https://gitlab.com/osci/ansible-role-postgrey/-/jobs/8132534437

podman info output

host:
arch: amd64
buildahVersion: 1.38.0-dev
cgroupControllers: []
cgroupManager: cgroupfs
cgroupVersion: v2
conmon:
package: conmon-2.1.12-2.fc40.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.12, commit: '
cpuUtilization:
idlePercent: 99.48
systemPercent: 0.17
userPercent: 0.36
cpus: 8
databaseBackend: sqlite
distribution:
distribution: fedora
variant: container
version: "40"
eventLogger: file
freeLocks: 2048
hostname: 62db983d0a8e
idMappings:
gidmap: null
uidmap: null
kernel: 6.10.12-200.fc40.x86_64
linkmode: dynamic
logDriver: k8s-file
memFree: 10774917120
memTotal: 16464338944
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.12.1-1.20241022144641897554.main.55.gd6f7cf5.fc40.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.13.0-dev
package: netavark-1.12.1-1.20241023095707204691.main.84.g5cb0b89.fc40.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.13.0-dev
ociRuntime:
name: crun
package: crun-1.18-1.20241025160550154285.main.5.g0cf2e6c.fc40.x86_64
path: /usr/bin/crun
version: |-
crun version UNKNOWN
commit: 466e79628c8c5b68670ac856df5604135d754070
rundir: /run/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20240906.g6b38f07-1.fc40.x86_64
version: |
pasta 0^20240906.g6b38f07-1.fc40.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/podman/podman.sock
rootlessNetworkCmd: pasta
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 8571056128
swapTotal: 8589930496
uptime: 524h 48m 21.00s (Approximately 21.83 days)
variant: ""
plugins:
authorization: null
log:

  • k8s-file
  • none
  • passthrough
  • journald
    network:
  • bridge
  • macvlan
  • ipvlan
    volume:
  • local
    registries:
    search:
  • registry.fedoraproject.org
  • registry.access.redhat.com
  • docker.io
    store:
    configFile: /etc/containers/storage.conf
    containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
    graphDriverName: overlay
    graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mount_program:
    Executable: /usr/bin/fuse-overlayfs
    Package: fuse-overlayfs-1.13-1.fc40.x86_64
    Version: |-
    fusermount3 version: 3.16.2
    fuse-overlayfs: version 1.13-dev
    FUSE library version 3.16.2
    using FUSE kernel interface version 7.38
    overlay.mountopt: nodev,fsync=0
    graphRoot: /var/lib/containers/storage
    graphRootAllocated: 107307073536
    graphRootUsed: 5107048448
    graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
    imageCopyTmpDir: /var/tmp
    imageStore:
    number: 0
    runRoot: /run/containers/storage
    transientStore: false
    volumePath: /var/lib/containers/storage/volumes
    version:
    APIVersion: 5.3.0-dev-2f6fca6ed
    Built: 1729814400
    BuiltTime: Fri Oct 25 00:00:00 2024
    GitCommit: ""
    GoVersion: go1.22.7
    Os: linux
    OsArch: linux/amd64
    Version: 5.3.0-dev-2f6fca6ed

Podman in a container

Yes

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

The host runs Fedora 40.

Additional information

In the outer container the nproc limit was low and I tried to set the same as the host limits but to no avail.

@duck-rh duck-rh added the kind/bug Categorizes issue or PR as related to a bug. label Nov 8, 2024
@sbrivio-rh
Copy link
Collaborator

In the outer container the nproc limit was low and I tried to set the same as the host limits but to no avail.

I guess you tried with ulimit -u... what error did you get in the outer container?

@duck-rh
Copy link
Author

duck-rh commented Nov 11, 2024

@sbrivio-rh I used to change /etc/security/limits.conf for some other limit in the before_script section and did the same thing for nproc. ulimit -a confirmed the new values were set. I added ulimit -u before running molecule but that did not change anything and there was no error.

@giuseppe
Copy link
Member

I think this is a side effect of 5ebba75.

You need to make sure the outer container specifies high enough RLIMIT_NPROC that you can use it for the inner container.

@Luap99
Copy link
Member

Luap99 commented Nov 11, 2024

@giuseppe I think the isRootless condition is wrong. Shouldn't we use isRunningInUserNs? Because in a nested userns we will not be able to bump up the resource limits any futher. Basically what we did for the oom acore already: f59a5f1

@duck-rh
Copy link
Author

duck-rh commented Nov 11, 2024

@giuseppe thanks, you were right, I did not bump the limit high enough. Nevertheless that was not sufficient, I had to boost it on the host too.

I just wonder why do we need such sky high limit. Most people run a single process and even with crazy uses like we do with nested containers and systemd etc, we're still light years from it. It's not a big deal but maybe nested podman do not need to try to set it again.

@giuseppe
Copy link
Member

@giuseppe I think the isRootless condition is wrong. Shouldn't we use isRunningInUserNs? Because in a nested userns we will not be able to bump up the resource limits any futher. Basically what we did for the oom acore already: f59a5f1

yes I think that would be a better condition. Should we also check for CAP_SYS_RESOURCE?

@Luap99
Copy link
Member

Luap99 commented Nov 12, 2024

That would certainly be the correct thing, we can only bump limits if we have CAP_SYS_RESOURCE in the init userns. But we do not do actual capabilities checks for most other things as root either so I don't think we strictly have to but it may be safer.

@giuseppe
Copy link
Member

That would certainly be the correct thing, we can only bump limits if we have CAP_SYS_RESOURCE in the init userns. But we do not do actual capabilities checks for most other things as root either so I don't think we strictly have to but it may be safer.

yeah let's go with the simplified version: #24547

Luap99 pushed a commit to Luap99/libpod that referenced this issue Nov 19, 2024
commit 5ebba75 implemented this
behaviour for rootless users, but the same limitation exists for any
user in a user namespace.  Change the check to use the clamp to the
current values anytime podman runs in a user namespace.

Closes: containers#24508

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 0a69aef)
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants