Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Fedora atomic 26 + nvidia-docker2 #648

Closed
olivier-dj opened this issue Feb 28, 2018 · 1 comment
Closed

Fedora atomic 26 + nvidia-docker2 #648

olivier-dj opened this issue Feb 28, 2018 · 1 comment

Comments

@olivier-dj
Copy link

Hello everyone

1. Issue or feature description

I'm working on using Openstack magnum and kubernetes to have gpu aware docker deployments. I adapted a fedora atomic image (which provide all of the requirement for openstack magnum) and installed the requirements for gpu utilization (nvidia-driver, cuda/cudnn, nvidia-docker2). The system upgrades, including nvidia-docker2 have been installed via the rpm-ostree package manager, except nvidia-driver and cuda (but for cuda/cudnn it's just file copying if i'm right). Fedora atomic doesn't support dkms or akmod, so I installed nvidia-driver with runfile at last, avoiding kernel upgrades which would break the install. nvidia-smi and cuda samples are functional. I supposed that dkms support has an impact only for kernel update but maybe for nvidia-docker2 as well?

2. Steps to reproduce the issue

docker run --rm nvidia/cuda nvidia-smi
Output:
container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/var/lib/docker/overlay2/72f8a07a4857ee246cd4b69b0a1c110253367c1eb72f616c44654335faabffe9/merged\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\"""
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/var/lib/docker/overlay2/72f8a07a4857ee246cd4b69b0a1c110253367c1eb72f616c44654335faabffe9/merged\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\""".

Or

mkdir /mycontainer
cd /mycontainer
mkdir rootfs
docker export $(docker create busybox) | tar -C rootfs -xvf -
nvidia-container-runtime spec
nvidia-container-runtime run 1

Output:
container_linux.go:247: starting container process caused "process_linux.go:362: container init caused "rootfs_linux.go:54: mounting \"cgroup\" to rootfs \"/mycontainer/rootfs\" at \"/sys/fs/cgroup\" caused \"no subsystem for mount\"""

3. Information

  • Security
    more /sys/fs/cgroup/devices/devices.list
    Output:
    a *:* rwm

And disabling selinux doesn't show any improvements.

  • Kernel version from uname -a
    Linux fedora.novalocal 4.15.4-200.fc26.x86_64 Add README image #1 SMP Mon Feb 19 19:43:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
    [ 5875.711206] docker0: port 1(veth598691a) entered blocking state
    [ 5875.713356] docker0: port 1(veth598691a) entered disabled state
    [ 5875.720088] device veth598691a entered promiscuous mode
    [ 5875.727272] IPv6: ADDRCONF(NETDEV_UP): veth598691a: link is not ready
    [ 5875.730830] IPv6: ADDRCONF(NETDEV_UP): veth6d3b5d8: link is not ready
    [ 5875.734218] IPv6: ADDRCONF(NETDEV_UP): veth6d3b5d8: link is not ready
    [ 5875.736882] IPv6: ADDRCONF(NETDEV_CHANGE): veth6d3b5d8: link becomes ready
    [ 5875.739504] IPv6: ADDRCONF(NETDEV_CHANGE): veth598691a: link becomes ready
    [ 5875.742263] docker0: port 1(veth598691a) entered blocking state
    [ 5875.744508] docker0: port 1(veth598691a) entered forwarding state
    [ 5876.210364] docker0: port 1(veth598691a) entered disabled state
    [ 5876.215538] device veth598691a left promiscuous mode
    [ 5876.218455] docker0: port 1(veth598691a) entered disabled state
  • Driver information from nvidia-smi -a
    nvidia-smi.txt
  • Docker version from docker version
    docker-info.txt
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
    libnvidia-container-tools-1.0.0-0.1.alpha.3.x86_64
    libnvidia-container1-1.0.0-0.1.alpha.3.x86_64
    nvidia-container-runtime-1.1.1-1.docker1.13.1.x86_64
    nvidia-docker2-2.0.2-1.docker1.13.1.noarch
  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.0
    build date: 2018-01-11T00:23+0000
    build revision: 4a618459e8ba522d834bb2b4c665847fae8ce0ad
    build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-16)
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
@3XX0
Copy link
Member

3XX0 commented Feb 28, 2018

See #634

@3XX0 3XX0 closed this as completed Mar 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants