Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

VolumeDriver.Create: internal error #188

Closed
HoloSound opened this issue Sep 2, 2016 · 11 comments
Closed

VolumeDriver.Create: internal error #188

HoloSound opened this issue Sep 2, 2016 · 11 comments

Comments

@HoloSound
Copy link

HoloSound commented Sep 2, 2016

Regarding #34
I found some commands which will not work in my configuration:

First some system information:

# uname -a
Linux studio16 4.2.0-42-lowlatency #49-Ubuntu SMP PREEMPT Tue Jun 28 23:12:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
# nvidia-smi
Fri Sep  2 15:37:59 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   33C    P8    N/A /  N/A |     86MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

Addition Information:

I had a runable version with 352.93 - but the package update to 352.99 caused
conistency troubles.

/var/log/kern.log

Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: API mismatch: the client has the version 352.99, but
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: this kernel module has the version 352.93.  Please
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: make sure that this kernel module and all NVIDIA driver
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: components have the same version.
Aug 23 19:48:44 studio16 kernel: [2149830.211535] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22

So I tried to deinstall everything - and reinstalled 367.44.

... and I pulled nvidia-docker code and compiled it with make.

(additionally make deb + dpkg -i ....deb )

with:

# nvidia-docker --version
Docker version 1.12.1, build 23cf638

Is there a possibility to get the nvidia-docker - not the docker - version information?

But:

# nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: create nvidia_driver_367.44: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

From issue/34 I thought to take the commands:

nvidia-docker volume setup

does not work:

# nvidia-docker volume setup

Usage:  docker volume COMMAND

Manage Docker volumes

Options:
      --help   Print usage

Commands:
  create      Create a volume
  inspect     Display detailed information on one or more volumes
  ls          List volumes
  rm          Remove one or more volumes

Run 'docker volume COMMAND --help' for more information on a command.
root@studio16:/var/log/upstart# type nvidia-docker
nvidia-docker gehasht ergibt (/usr/local/bin/nvidia-docker)
root@studio16:/var/log/upstart# ls -l /usr/local/bin/nvidia-docker
-rwxr-xr-x 1 root root 6784176 Sep  2 17:11 /usr/local/bin/nvidia-docker
root@studio16:/var/log/upstart# 

Did the flags change in one of the last versions of docker ?

root@studio16:~# nvidia-docker volume ls
DRIVER              VOLUME NAME
local               nvidia_driver_352.93
root@studio16:~# 

... this volume is the regards to the previous running version!

Additionally I thought to get some information in

/var/log/upstart/nvidia-docker.log

  • but file does not exist!
# ls -l /var/log/upstart
insgesamt 0
#

How can I generate a volume name regarding to my actual version 367.44?

@atrophiedbrain
Copy link

I am having a very similar problem on Ubuntu 16.04. I have tried different versions of the nvidia driver, each on a fresh install of Ubuntu. The same error message always persists:

docker: Error response from daemon: create nvidia_driver_367.44: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

I am currently getting:

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

I do not have a /var/log/upstart/nvidia-docker.log file either.

However, I see the following in /var/log/kern.log:

kernel: [ 1698.375241] aufs au_opts_verify:1597:docker[8045]: dirperm1 breaks the protection by the permission bits on the lower branch

Do you see this error in your /var/log/kern.log?

I wonder if we are having the same problem as the poster BLACKY_001 here: https://devtalk.nvidia.com/default/topic/960139/toubles-at-ubuntu-update/?offset=6

In addition to trying different versions of the nvidia driver, I've also tried Docker 1.9, 1.11, and 1.12. I have the same problem on each version.

I have made sure to only have a single nvidia driver installed at a time.

@HoloSound
Copy link
Author

@atrophiedbrain

NO - I do not get the message "breaks the protection by the permission bits on the lower branch"
in kern.log.

Do not wonder - it's also me - because I'm searching in all possible directions.

  1. nvidia driver - due version update with new ubuntu package
  2. ubuntu driver rollout - maybe instable package rollout (shutting down X-windows during driver update?)
  3. nvidia-docker - because of the volume error message
    Maybe I get a clear picture with all these puzzle-pieces.

Reg. 16.04: I also thought to update to this OS version 16

  • but if you download nvidia drivers you may only select 14 and 15 !?
    I think you have to be "nvidia developer" to get nvidia 16 preview?

What do you get with:

docker volume list
(do you have a volume with the same name like you have installed?)

and does

nvidia-docker volume setup

work?

@3XX0
Copy link
Member

3XX0 commented Sep 4, 2016

nvidia-docker volume setup has been removed and should not be used.

Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: API mismatch: the client has the version 352.99, but
Aug 23 19:48:44 studio16 kernel: [2149830.211531] NVRM: this kernel module has the version 352.93. Please

You installed a new driver but didn't reboot or reload the module.
Make sure your driver works properly on the host (e.g. nvidia-smi)

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details. See 'docker run --help'.

Check the ouptut of the logs:

On ubuntu 14.04 (upstart):

$ cat /var/log/upstart/nvidia-docker.log

If you have systemd (centos 7, ubuntu 16.04):

$ systemctl status nvidia-docker
$ journalctl -n -u nvidia-docker

@atrophiedbrain
Copy link

Output of my systemctl status nvidia-docker command:

nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor pr
   Active: active (running) since Sat 2016-09-03 21:00:38 EDT; 50s ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 3021 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docke
  Process: 2996 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (
 Main PID: 2995 (nvidia-docker-p)
    Tasks: 7
   Memory: 22.0M
      CPU: 505ms
   CGroup: /system.slice/nvidia-docker.service
           └─2995 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 03 21:00:38 jm-lab systemd[1]: Starting NVIDIA Docker plugin...
Sep 03 21:00:38 jm-lab systemd[1]: Started NVIDIA Docker plugin.
Sep 03 21:00:38 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:38 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin
Sep 03 21:00:39 jm-lab nvidia-docker-plugin[2995]: /usr/bin/nvidia-docker-plugin

Output of my nvidia-smi command:

Sat Sep  3 21:04:03 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 364.19     Driver Version: 364.19         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
| 14%   53C    P0    56W / 195W |    188MiB /  4094MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      3082    G   /usr/lib/xorg/Xorg                             132MiB |
|    0      4010    G   compiz                                          36MiB |
+-----------------------------------------------------------------------------+

@3XX0
Copy link
Member

3XX0 commented Sep 4, 2016

What about journalctl -n -u nvidia-docker ?

@HoloSound
Copy link
Author

Output of nvidia-smi command:

# nvidia-smi
Sun Sep  4 09:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   33C    P8    N/A /  N/A |     87MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
#

Output of journalctl -n -u nvidia-docker:

# journalctl -n -u nvidia-docker
-- Logs begin at Fre 2016-09-02 22:20:36 CEST, end at Son 2016-09-04 09:06:08 CEST. --
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA unified memory
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA management library
Sep 02 22:21:03 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:03 Discovering GPU devices
Sep 02 22:21:05 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving plugin API at /var/lib/nvidia-docker
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving remote API at localhost:3476
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received activate request
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Plugins activated [VolumeDriver]
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received create request for volume 'nvidia_driver_367.44'
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/367.4
# 

with:

# file  nvidia-cuda-mps-control 
nvidia-cuda-mps-control: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.4.0, stripped
# 

but no existing:

# ls -la /var/lib/nvidia-docker/volumes/nvidia_driver/
insgesamt 8
drwxr-xr-x 2 nvidia-docker nvidia-docker 4096 Sep  2 22:22 .
drwxr-xr-x 3 nvidia-docker nvidia-docker 4096 Sep  2 17:23 ..
# 

Output of systemctl status nvidia-docker

# systemctl status nvidia-docker -l
● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Fre 2016-09-02 22:21:01 CEST; 1 day 10h ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 1367 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 1353 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 1352 (nvidia-docker-p)
   Memory: 21.2M
      CPU: 577ms
   CGroup: /system.slice/nvidia-docker.service
           └─1352 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA unified memory
Sep 02 22:21:02 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:02 Loading NVIDIA management library
Sep 02 22:21:03 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:03 Discovering GPU devices
Sep 02 22:21:05 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving plugin API at /var/lib/nvidia-docker
Sep 02 22:21:06 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:21:06 Serving remote API at localhost:3476
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received activate request
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Plugins activated [VolumeDriver]
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Received create request for volume 'nvidia_driver_367.44'
Sep 02 22:22:43 studio16 nvidia-docker-plugin[1352]: /usr/bin/nvidia-docker-plugin | 2016/09/02 22:22:43 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/367.44/bin/nvidia-cuda-mps-control: invalid cross-device link
#

@HoloSound
Copy link
Author

@atrophiedbrain

Your output of systemctl status nvidia-docker is cut after 80? columns - necessary / usable information follows to the right!

@3XX0
Copy link
Member

3XX0 commented Sep 4, 2016

See #133 and this comment in particular

@atrophiedbrain
Copy link

Output of nvidia-docker volume ls:

DRIVER VOLUME NAME

output of nvidia-docker run --rm nvidia/cuda nvidia-smi:

docker: Error response from daemon: create nvidia_driver_364.19: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

I have no log file /var/log/upstart/nvidia-docker.log.

Output of my systemctl status nvidia-docker:

nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2016-09-04 19:42:06 EDT; 4h 8min ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 3070 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 3046 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 3045 (nvidia-docker-p)
    Tasks: 7
   Memory: 21.7M
      CPU: 503ms
   CGroup: /system.slice/nvidia-docker.service
           └─3045 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA unified memory
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA management library
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Discovering GPU devices
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving plugin API at /var/lib/nvidia-docker
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving remote API at localhost:3476
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Received activate request
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Plugins activated [VolumeDriver]
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Received create request for volume 'nvidia_driver_364.19'

Output of my journalctl -n -u nvidia-docker:

Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA unified memory
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Loading NVIDIA management library
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Discovering GPU devices
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Provisioning volumes at /var/lib/nvidia-docker/volumes
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving plugin API at /var/lib/nvidia-docker
Sep 04 19:42:07 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 19:42:07 Serving remote API at localhost:3476
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Received activate request
Sep 04 23:46:12 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:12 Plugins activated [VolumeDriver]
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Received create request for volume 'nvidia_driver_364.19'
Sep 04 23:46:58 jm-lab nvidia-docker-plugin[3045]: /usr/bin/nvidia-docker-plugin | 2016/09/04 23:46:58 Error: link /usr/lib/nvidia-364/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/364.19/bin/nvidia-cuda-mps-control: invalid cross-device link
~
~

Found in /var/log/kern.log:

Sep  4 23:46:58 jm-lab kernel: [  294.327466] aufs au_opts_verify:1597:dockerd[3133]: dirperm1 breaks the protection by the permission bits on the lower branch
Sep  4 23:46:58 jm-lab kernel: [  294.355806] aufs au_opts_verify:1597:dockerd[3133]: dirperm1 breaks the protection by the permission bits on the lower branch

@atrophiedbrain
Copy link

atrophiedbrain commented Sep 5, 2016

I see from https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin#known-limitations that I need to use nvidia-docker-plugin -d to change the volume directory for nvidia-docker-plugin as it has to be on the same mount as the nvidia driver.

I tried nvidia-docker-plugin -d "/usr/nvidia-docker/volumes" but ran into problems with the following message:

nvidia-docker-plugin | 2016/09/05 00:07:18 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/09/05 00:07:18 Loading NVIDIA management library
nvidia-docker-plugin | 2016/09/05 00:07:18 Discovering GPU devices
nvidia-docker-plugin | 2016/09/05 00:07:18 Provisioning volumes at /usr/nvidia-docker/volumes
nvidia-docker-plugin | 2016/09/05 00:07:18 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/09/05 00:07:18 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/09/05 00:07:18 Error: listen tcp 127.0.0.1:3476: bind: address already in use

I followed the instructions at #133 (comment) but now received the following message:

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: oci runtime error: exec: "nvidia-smi": executable file not found in $PATH.

And following this post #184 I knew I needed to change the permissions so that nvidia-docker user can write/modify inside /usr/local/nvidia-driver but I was not sure what commands to run to do this.

I tried chown nvidia-docker /usr/local/nvidia-driver but saw the following when I ran sudo nvidia-docker run --rm nvidia/cuda nvidia-smi:
docker: Error response from daemon: no such volume: nvidia_driver_364.19.

While journalctl -n -u nvidia-docker showed:

Sep 05 00:37:46 jm-lab nvidia-docker-plugin[3037]: /usr/bin/nvidia-docker-plugin | 2016/09/05 00:37:46 Received mount request for volume 'nvidia_driver_364.19'
Sep 05 00:37:47 jm-lab nvidia-docker-plugin[3037]: /usr/bin/nvidia-docker-plugin | 2016/09/05 00:37:47 Received unmount request for volume 'nvidia_driver_364.19'

However, after restarting it worked!

Output of sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

Mon Sep  5 04:41:46 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 364.19     Driver Version: 364.19         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
|  0%   43C    P0    54W / 195W |    171MiB /  4094MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Thank you all for your help!

tl;dr: I was having the same problem as these posts: #133 (comment)
#184
and mentioned https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin#known-limitations

@HoloSound
Copy link
Author

HoloSound commented Sep 5, 2016

Made:

  1. apt-get purge nvidia-docker
  2. mkdir /usr/local/nvidia-driver
    2.5) chown nvidia-docker:nvidia-docker /usr/local/nvidia-driver <---- !!!!
  3. cd compile/nvidia-docker
  4. git pull
  5. make deb
  6. dpkg -i ./tools/dist/nvidia-docker_1.0.0~rc.3-1_amd64.deb
  7. systemctl edit nvidia-docker with Link Issue on volume creation #133
  8. reboot

Result:

$ nvidia-smi
Mon Sep  5 16:43:38 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   34C    P8    N/A /  N/A |     72MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
$
$ dpkg -l nvidia-docker
Gewünscht=Unbekannt/Installieren/R=Entfernen/P=Vollständig Löschen/Halten
| Status=Nicht/Installiert/Config/U=Entpackt/halb konFiguriert/
         Halb installiert/Trigger erWartet/Trigger anhängig
|/ Fehler?=(kein)/R=Neuinstallation notwendig (Status, Fehler: GROSS=schlecht)
||/ Name                                       Version                    Architektur                Beschreibung
+++-==========================================-==========================-==========================-=========================================================================================
ii  nvidia-docker                              1.0.0~rc.3-1               amd64                      NVIDIA Docker container tools
$

$ journalctl -n -u nvidia-docker -l
-- Logs begin at Mon 2016-09-05 16:35:52 CEST, end at Mon 2016-09-05 16:47:19 CEST. --
Sep 05 16:47:19 studio16 nvidia-docker-plugin[1350]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Successfully terminated
Sep 05 16:47:19 studio16 systemd[1]: Stopped NVIDIA Docker plugin.
Sep 05 16:47:19 studio16 systemd[1]: Starting NVIDIA Docker plugin...
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Loading NVIDIA unified memory
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Loading NVIDIA management library
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Discovering GPU devices
Sep 05 16:47:19 studio16 systemd[1]: Started NVIDIA Docker plugin.
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Provisioning volumes at /usr/local/nvidia-driver
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Serving plugin API at /var/lib/nvidia-docker
Sep 05 16:47:19 studio16 nvidia-docker-plugin[3460]: /usr/bin/nvidia-docker-plugin | 2016/09/05 16:47:19 Serving remote API at localhost:3476
$

nvidia-docker uses /usr/local/nvidia-driver (on the same physical partition like #133 )

AFTERWARDS:

$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
Mon Sep  5 15:13:40 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 670     Off  | 0000:01:00.0     N/A |                  N/A |
| 28%   35C    P8    N/A /  N/A |     72MiB /  1991MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+
holosound@studio16:~$

.... IT WORKED !

May thanks!

@3XX0 3XX0 closed this as completed Sep 7, 2016
@NVIDIA NVIDIA locked and limited conversation to collaborators Mar 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants