You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.
Hi everyone, the issue I am facing is similar to the one raise in #1133 and #628, I have persistence enabled and I think this is not a driver issue as I am able to launch the container but very rarely.
1. Issue or feature description
On running the command:
nvidia-docker run -it nvcr.io/nvidia/ytorch:20.03-py3
nvidia-docker run -it nvcr.io/nvidia/ytorch:20.03-py3
3. Information to attach (optional if deemed irrelevant)
Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
I0408 11:01:43.762921 7345 nvc.c:281] initializing library context (version=1.0.1, build=038fb92d00c94f97d61492d4ed1f82e981129b74)
I0408 11:01:43.763137 7345 nvc.c:255] using root /
I0408 11:01:43.763179 7345 nvc.c:256] using ldcache /etc/ld.so.cache
I0408 11:01:43.763218 7345 nvc.c:257] using unprivileged user 1011:1011
W0408 11:01:43.791108 7346 nvc.c:186] failed to set inheritable capabilities
W0408 11:01:43.791465 7346 nvc.c:187] skipping kernel modules load due to failure
I0408 11:01:43.793110 7347 driver.c:133] starting driver service
W0408 11:02:08.828662 7345 driver.c:220] terminating driver service (forced)
I0408 11:02:23.293544 7345 driver.c:233] driver service terminated with signal 15
nvidia-container-cli: initialization error: driver error: timed out
Kernel version from uname -a
Linux 4.15.0-45-generic 48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Driver information from nvidia-smi
Thu Apr 8 16:38:13 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 30C P0 43W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Docker version from docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:47 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
NVIDIA packages version from dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================-===============-===============-================================================
un libgldispatch0-nvidia (no description available)
ii libnvidia-cfg1-410:am 410.104-0ubuntu amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any (no description available)
un libnvidia-common (no description available)
ii libnvidia-common-410 410.104-0ubuntu all Shared files used by the NVIDIA libraries
ii libnvidia-compute-410 410.104-0ubuntu amd64 NVIDIA libcompute package
ii libnvidia-container-t 1.0.1-1 amd64 NVIDIA container runtime library (command-line t
ii libnvidia-container1: 1.0.1-1 amd64 NVIDIA container runtime library
un libnvidia-decode (no description available)
ii libnvidia-decode-410: 410.104-0ubuntu amd64 NVIDIA Video Decoding runtime libraries
un libnvidia-diagnostic (no description available)
ii libnvidia-diagnostic- 410.104-0ubuntu amd64 NVIDIA driver diagnostics utilities
un libnvidia-encode (no description available)
ii libnvidia-encode-410: 410.104-0ubuntu amd64 NVENC Video Encoding runtime library
un libnvidia-fbc1 (no description available)
ii libnvidia-fbc1-410:am 410.104-0ubuntu amd64 NVIDIA OpenGL-based Framebuffer Capture runtime
un libnvidia-gl (no description available)
ii libnvidia-gl-410:amd6 410.104-0ubuntu amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and V
un libnvidia-ifr1 (no description available)
ii libnvidia-ifr1-410:am 410.104-0ubuntu amd64 NVIDIA OpenGL-based Inband Frame Readback runtim
un nvhealth-module-nvidi (no description available)
un nvidia-304 (no description available)
un nvidia-340 (no description available)
un nvidia-384 (no description available)
un nvidia-390 (no description available)
ii nvidia-compute-utils- 410.104-0ubuntu amd64 NVIDIA compute utilities
ii nvidia-container-runt 2.0.0+docker18. amd64 NVIDIA container runtime
ii nvidia-container-runt 1.4.0-1 amd64 NVIDIA container runtime hook
un nvidia-current-diagno (no description available)
ii nvidia-dkms-410 410.104-0ubuntu amd64 NVIDIA DKMS package
un nvidia-dkms-kernel (no description available)
un nvidia-docker (no description available)
ii nvidia-docker2 2.0.3+docker18. all nvidia-docker CLI wrapper
ii nvidia-driver-410 410.104-0ubuntu amd64 NVIDIA driver metapackage
un nvidia-driver-binary (no description available)
ii nvidia-headless-410 410.104-0ubuntu amd64 NVIDIA headless metapackage
ii nvidia-headless-no-dk 410.104-0ubuntu amd64 NVIDIA headless metapackage - no DKMS
un nvidia-kernel-common (no description available)
ii nvidia-kernel-common- 410.104-0ubuntu amd64 Shared files used with the kernel module
un nvidia-kernel-source (no description available)
ii nvidia-kernel-source- 410.104-0ubuntu amd64 NVIDIA kernel source package
ii nvidia-modprobe 410.104-0ubuntu amd64 Load the NVIDIA kernel driver and create device
un nvidia-opencl-icd (no description available)
ii nvidia-peer-memory 1.0-7 all nvidia peer memory kernel module.
ii nvidia-peer-memory-dk 1.0-7 all DKMS support for nvidia-peer-memory kernel modul
un nvidia-persistenced (no description available)
un nvidia-prime (no description available)
ii nvidia-settings 410.104-0ubuntu amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binar (no description available)
un nvidia-smi (no description available)
un nvidia-thea (no description available)
un nvidia-utils (no description available)
ii nvidia-utils-410 410.104-0ubuntu amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nv 410.104-0ubuntu amd64 NVIDIA binary Xorg driver
NVIDIA container library version from nvidia-container-cli -V
[1149264.422516] docker0: port 1(veth1497707) entered disabled state
[1149264.442891] device veth1497707 left promiscuous mode
[1149264.442923] docker0: port 1(veth1497707) entered disabled state
Docker command, image and tag used
Command: nvidia-docker run -it nvcr.io/nvidia/pytorch:20.03-py3
Image and tag: pytorch: 20.03
The text was updated successfully, but these errors were encountered:
Hi everyone, the issue I am facing is similar to the one raise in #1133 and #628, I have persistence enabled and I think this is not a driver issue as I am able to launch the container but very rarely.
1. Issue or feature description
On running the command:
I get an output as:
2. Steps to reproduce the issue
docker pull nvcr.io/nvidia/pytorch:21.03-py3
nvidia-docker run -it nvcr.io/nvidia/ytorch:20.03-py3
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
uname -a
nvidia-smi
docker version
dpkg -l '*nvidia*'
nvidia-container-cli -V
dmesg
The text was updated successfully, but these errors were encountered: