Fedora atomic 26 + nvidia-docker2 #648

olivier-dj · 2018-02-28T12:32:04Z

Hello everyone

1. Issue or feature description

I'm working on using Openstack magnum and kubernetes to have gpu aware docker deployments. I adapted a fedora atomic image (which provide all of the requirement for openstack magnum) and installed the requirements for gpu utilization (nvidia-driver, cuda/cudnn, nvidia-docker2). The system upgrades, including nvidia-docker2 have been installed via the rpm-ostree package manager, except nvidia-driver and cuda (but for cuda/cudnn it's just file copying if i'm right). Fedora atomic doesn't support dkms or akmod, so I installed nvidia-driver with runfile at last, avoiding kernel upgrades which would break the install. nvidia-smi and cuda samples are functional. I supposed that dkms support has an impact only for kernel update but maybe for nvidia-docker2 as well?

2. Steps to reproduce the issue

docker run --rm nvidia/cuda nvidia-smi
Output:
container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/var/lib/docker/overlay2/72f8a07a4857ee246cd4b69b0a1c110253367c1eb72f616c44654335faabffe9/merged\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\"""
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/var/lib/docker/overlay2/72f8a07a4857ee246cd4b69b0a1c110253367c1eb72f616c44654335faabffe9/merged\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\""".

Or

mkdir /mycontainer
cd /mycontainer
mkdir rootfs
docker export $(docker create busybox) | tar -C rootfs -xvf -
nvidia-container-runtime spec
nvidia-container-runtime run 1

Output:
container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/mycontainer/rootfs\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\"""

3. Information

Security
more /sys/fs/cgroup/devices/devices.list
Output:
a *:* rwm

And disabling selinux doesn't show any improvements.

Kernel version from uname -a
Linux fedora.novalocal 4.15.4-200.fc26.x86_64 Add README image #1 SMP Mon Feb 19 19:43:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Any relevant kernel output lines from dmesg
[ 5875.711206] docker0: port 1(veth598691a) entered blocking state
[ 5875.713356] docker0: port 1(veth598691a) entered disabled state
[ 5875.720088] device veth598691a entered promiscuous mode
[ 5875.727272] IPv6: ADDRCONF(NETDEV_UP): veth598691a: link is not ready
[ 5875.730830] IPv6: ADDRCONF(NETDEV_UP): veth6d3b5d8: link is not ready
[ 5875.734218] IPv6: ADDRCONF(NETDEV_UP): veth6d3b5d8: link is not ready
[ 5875.736882] IPv6: ADDRCONF(NETDEV_CHANGE): veth6d3b5d8: link becomes ready
[ 5875.739504] IPv6: ADDRCONF(NETDEV_CHANGE): veth598691a: link becomes ready
[ 5875.742263] docker0: port 1(veth598691a) entered blocking state
[ 5875.744508] docker0: port 1(veth598691a) entered forwarding state
[ 5876.210364] docker0: port 1(veth598691a) entered disabled state
[ 5876.215538] device veth598691a left promiscuous mode
[ 5876.218455] docker0: port 1(veth598691a) entered disabled state
Driver information from nvidia-smi -a
nvidia-smi.txt
Docker version from docker version
docker-info.txt
NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
libnvidia-container-tools-1.0.0-0.1.alpha.3.x86_64
libnvidia-container1-1.0.0-0.1.alpha.3.x86_64
nvidia-container-runtime-1.1.1-1.docker1.13.1.x86_64
nvidia-docker2-2.0.2-1.docker1.13.1.noarch
NVIDIA container library version from nvidia-container-cli -V
version: 1.0.0
build date: 2018-01-11T00:23+0000
build revision: 4a618459e8ba522d834bb2b4c665847fae8ce0ad
build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-16)
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

The text was updated successfully, but these errors were encountered:

3XX0 · 2018-02-28T19:22:47Z

See #634

3XX0 closed this as completed Mar 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fedora atomic 26 + nvidia-docker2 #648

Fedora atomic 26 + nvidia-docker2 #648

olivier-dj commented Feb 28, 2018

3XX0 commented Feb 28, 2018

Fedora atomic 26 + nvidia-docker2 #648

Fedora atomic 26 + nvidia-docker2 #648

Comments

olivier-dj commented Feb 28, 2018

1. Issue or feature description

2. Steps to reproduce the issue

3. Information

3XX0 commented Feb 28, 2018