You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Configure NVIDIA CTK for use with CDI: nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Test CDI integration for podman run which works: podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
Start a podman build with the same device label which fails:
# Get a test containerfile
curl -O https://raw.githubusercontent.com/kenmoini/smart-drone-patterns/main/apps/darknet/Containerfile.ubnt22
# Build a container with the device label which fails
podman build --device nvidia.com/gpu=all --security-opt=label=disable -t darknet -f Containerfile.ubnt22 .
# - Output
Error: creating build executor: getting info of source device nvidia.com/gpu=all: stat nvidia.com/gpu=all: no such file or directory
# Build a container with the direct device path which works
podman build --device /dev/nvidia0 -t darknet -f Containerfile.ubnt22 --security-opt=label=disable .
Describe the results you received
The result of using the CDI device label fails:
podman build --device nvidia.com/gpu=all --security-opt=label=disable -t darknet -f Containerfile.ubnt22 .
Error: creating build executor: getting info of source device nvidia.com/gpu=all: stat nvidia.com/gpu=all: no such file or directory
Describe the results you expected
The container build to start with the device label - only works if you use the device path, but that doesn't seem to load all the associated paths that are defined in the generated CDI configuration.
podman info output
host:
arch: arm64buildahVersion: 1.31.3cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misccgroupManager: systemdcgroupVersion: v2conmon:
package: conmon-2.1.8-1.el9.aarch64path: /usr/bin/conmonversion: 'conmon version 2.1.8, commit: f0f506932ce1dc9fc7f1adb457a73d0a00207272'cpuUtilization:
idlePercent: 99.98systemPercent: 0.01userPercent: 0.01cpus: 32databaseBackend: boltdbdistribution:
distribution: '"rhel"'version: "9.3"eventLogger: journaldfreeLocks: 2048hostname: avalon.kemo.labsidMappings:
gidmap: nulluidmap: nullkernel: 5.14.0-362.18.1.el9_3.aarch64linkmode: dynamiclogDriver: journaldmemFree: 121339949056memTotal: 133915746304networkBackend: netavarknetworkBackendInfo:
backend: netavarkdns:
package: aardvark-dns-1.7.0-1.el9.aarch64path: /usr/libexec/podman/aardvark-dnsversion: aardvark-dns 1.7.0package: netavark-1.7.0-2.el9_3.aarch64path: /usr/libexec/podman/netavarkversion: netavark 1.7.0ociRuntime:
name: crunpackage: crun-1.8.7-1.el9.aarch64path: /usr/bin/crunversion: |- crun version 1.8.7 commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4 rundir: /run/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJLos: linuxpasta:
executable: /bin/pastapackage: passt-0^20230818.g0af928e-4.el9.aarch64version: | pasta 0^20230818.g0af928e-4.el9.aarch64 Copyright Red Hat GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.remoteSocket:
exists: truepath: /run/podman/podman.socksecurity:
apparmorEnabled: falsecapabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOTrootless: falseseccompEnabled: trueseccompProfilePath: /usr/share/containers/seccomp.jsonselinuxEnabled: trueserviceIsRemote: falseslirp4netns:
executable: /bin/slirp4netnspackage: slirp4netns-1.2.1-1.el9.aarch64version: |- slirp4netns version 1.2.1 commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194 libslirp: 4.4.0 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.2swapFree: 4294963200swapTotal: 4294963200uptime: 105h 12m 27.00s (Approximately 4.38 days)plugins:
authorization: nulllog:
- k8s-file
- none
- passthrough
- journaldnetwork:
- bridge
- macvlan
- ipvlanvolume:
- localregistries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.iostore:
configFile: /etc/containers/storage.confcontainerStore:
number: 0paused: 0running: 0stopped: 0graphDriverName: overlaygraphOptions:
overlay.mountopt: nodev,metacopy=ongraphRoot: /var/lib/containers/storagegraphRootAllocated: 1993421922304graphRootUsed: 28735803392graphStatus:
Backing Filesystem: xfsNative Overlay Diff: "false"Supports d_type: "true"Using metacopy: "true"imageCopyTmpDir: /var/tmpimageStore:
number: 4runRoot: /run/containers/storagetransientStore: falsevolumePath: /var/lib/containers/storage/volumesversion:
APIVersion: 4.6.1Built: 1705652546BuiltTime: Fri Jan 19 03:22:26 2024GitCommit: ""GoVersion: go1.20.12Os: linuxOsArch: linux/arm64Version: 4.6.1
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Running on RHEL 9.3 on an Ampere Altra system - same error on an X86 system.
Issue Description
When using NVIDIA GPUs with Podman via the Container Device Interface
podman build
fails to use labeled devices whilepodman run
works as intended.However, if using the direct device path the
podman build
execution works as expected.Steps to reproduce the issue
Steps to reproduce the issue
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
podman run
which works:podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
podman build
with the same device label which fails:Describe the results you received
The result of using the CDI device label fails:
Describe the results you expected
The container build to start with the device label - only works if you use the device path, but that doesn't seem to load all the associated paths that are defined in the generated CDI configuration.
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Running on RHEL 9.3 on an Ampere Altra system - same error on an X86 system.
Additional information
Looks like this also affects buildah:
#5432
#5443
The text was updated successfully, but these errors were encountered: