Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid argument error at container startup after reboot #22576

Closed
DavidePrincipi opened this issue May 2, 2024 · 4 comments
Closed

Invalid argument error at container startup after reboot #22576

DavidePrincipi opened this issue May 2, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@DavidePrincipi
Copy link

DavidePrincipi commented May 2, 2024

Issue Description

This issue occurs randomly after a system reboot. One or more services running in rootless or rootfull Podman containers fails to start with invalid argument error. As reported in #21274, it looks different from those in the Troubleshooting Guide.

Steps to reproduce the issue

Not always reproducible, please see the results.

Describe the results you received

In the system journal we find messages like this

May 02 09:51:29 R1-pve.rocky9-pve.org podman[6068]: time="2024-05-02T09:51:29+02:00" level=warning msg="Unmounting container \"samba-dc\" while attempting to delete storage: unmounting \"/home/samba1/.local/share/containers/storage/overlay/c07c970255101ffff8fb38162c32beb7ef7884f8d>
May 02 09:51:29 R1-pve.rocky9-pve.org podman[6068]: Error: removing storage for container "samba-dc": unmounting "/home/samba1/.local/share/containers/storage/overlay/c07c970255101ffff8fb38162c32beb7ef7884f8d1a12f4cdd0da18adc1c4873/merged": invalid argument

Following instructions here #21274 (comment), we got this strace output (for another container):

https://gist.github.com/stephdl/95afc3c028ebdd3d11d3014cd7efea81

Describe the results you expected

The container should start instead.

podman info output

[openldap1@r3-pve state]$ rpm -q podman 
podman-4.6.1-8.el9_3.x86_64

[openldap1@r3-pve state]$ podman version
Client:       Podman Engine
Version:      4.6.1
API Version:  4.6.1
Go Version:   go1.20.12
Built:        Wed Mar  6 11:08:41 2024
OS/Arch:      linux/amd64

[openldap1@r3-pve state]$ podman info
host:
  arch: amd64
  buildahVersion: 1.31.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: cebaba63f66de0e92cdc7e2a59f39c9208281158'
  cpuUtilization:
    idlePercent: 98.96
    systemPercent: 0.32
    userPercent: 0.72
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: '"rocky"'
    version: "9.3"
  eventLogger: file
  freeLocks: 2047
  hostname: r3-pve.rocky9-pve3.org
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
  kernel: 5.14.0-362.24.1.el9_3.0.1.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 5071155200
  memTotal: 8057634816
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-2.el9_3.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.7-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.7
      commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
      rundir: /run/user/1003/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/user/1003/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 6874460160
  swapTotal: 6874460160
  uptime: 0h 13m 50.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/openldap1/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/openldap1/.local/share/containers/storage
  graphRootAllocated: 19925041152
  graphRootUsed: 6092189696
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1003/containers
  transientStore: false
  volumePath: /home/openldap1/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.1
  Built: 1709719721
  BuiltTime: Wed Mar  6 11:08:41 2024
  GitCommit: ""
  GoVersion: go1.20.12
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

Virtual machine under Proxmox.

Additional information

Issue appears at random times. Some systems (maybe slower than others) seems to hit it more frequently.

Bug reported here NethServer/dev#6916

After applying this workaround the container starts

#19491 (comment)

@DavidePrincipi DavidePrincipi added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2024
@Luap99
Copy link
Member

Luap99 commented May 2, 2024

This works on newer versions

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale May 2, 2024
@DavidePrincipi
Copy link
Author

Could you help me to find the commit that fixes the issue? I need to track it both on Rocky Linux 9 and Debian 12.

@Luap99
Copy link
Member

Luap99 commented May 2, 2024

you linked to it already containers/storage#1687

@DavidePrincipi
Copy link
Author

Thank you for confirming it!

IIUC, the fix appeared in containers/storage 1.49 that was then added to Podman 4.7

e092f88

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 1, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants