Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Upgraded nvidia drivers from 361.42 to 375.26 to run cuda 8 docker images nvidia-docker hit error saying docker: Error response from daemon: create nvidia_driver_375.26: VolumeDriver.Create: internal error #290

Closed
jsmith50500 opened this issue Jan 13, 2017 · 3 comments

Comments

@jsmith50500
Copy link

Hi there
I wonder if somebody can help me.
Ive been running nvidia-docker on cuda 7.5 based images for a while with no issues at all.
I wanted to run a cuda 8.0 image then I realized after looking at the compatibility matrix i would need to upgrade my nvidia drivers. Went ahead and upgraded my drivers on centos 7 host to latest 375.26
ran nvidia-smi everything seems ok

nvid

As a further test installed latest cuda 8 on centos 7 host machine and ran devicequery from samples dir.
again everything seems ok.

nvid2

so i was ready to try nvidia-docker again
nvidia-docker run --rm nvidia/cuda nvidia-smi

but received an error saying
docker: Error response from daemon: create nvidia_driver_375.26: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

The log has the following info
nvidia-docker-plugin | 2017/01/13 20:46:58 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/01/13 20:46:58 Loading NVIDIA management library
nvidia-docker-plugin | 2017/01/13 20:46:59 Discovering GPU devices
nvidia-docker-plugin | 2017/01/13 20:46:59 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2017/01/13 20:46:59 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/01/13 20:46:59 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/01/13 21:51:09 Received activate request
nvidia-docker-plugin | 2017/01/13 21:51:09 Plugins activated [VolumeDriver]
nvidia-docker-plugin | 2017/01/13 21:51:10 Received create request for volume 'nvidia_driver_375.26'
nvidia-docker-plugin | 2017/01/13 21:51:10 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/bin/nvidia$

so the link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/bin/nvidia seems to be an issue.Im not sure what this means?

I noticed that when I ran
docker volume inspect nvidia_driver_361.42
there was still a reference to the older nvidia drivers
[
{
"Name": "nvidia_driver_361.42",
"Driver": "local",
"Mountpoint": "/var/lib/docker/volumes/nvidia_driver_361.42/_data",
"Labels": {},
"Scope": "local"
}
]

But if i run
docker volume inspect nvidia_driver_375.26 there is no such volume.

I tried to reinstall the nvidia-docker 1.00rc3 but couldnt uninstall it to reinstall it.
im on docker 1.12r6 on centos 7.
Any ideas you have would be of great help.
If one upgrades the nvidia drivers is there a special procedure to upgrade nvidia-docker so it knows how to find the new drivers.

Thank you for your help

@flx42
Copy link
Member

flx42 commented Jan 14, 2017

nvidia-docker-plugin | 2017/01/13 21:51:10 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/bin/nvidia$

This looks truncated, did you paste the full log? Or maybe the plugin crashed at this point, that would be bad :(

@jsmith50500
Copy link
Author

So to get past most of the issues I looked at isssue 133
and because of the nvidia drivers upgrade I had a number of duplicate driver files I looked at issue 229

The steps
1.yum remove nvidia-docker
2.rm -rf /var/lib/nvidia-docker
3.rm -rf /usr/local/nvidia-driver
4.mkdir /usr/local/nvidia-driver
5.systemctl start docker
6.sudo rpm -i /tmp/nvidia-docker*.rpm (After downloading latest nvidia-docker files)
7.chown nvidia-docker:nvidia-docker /usr/local/nvidia-driver

8.systemctl edit nvidia-docker
[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-docker-plugin -s $SOCK_DIR -d /usr/local/nvidia-driver

9.reboot
10.systemctl start docker
11.sudo nvidia-docker-plugin -d /usr/local/nvidia-driver
12.sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

Results

sudo nvidia-docker-plugin -d /usr/local/nvidia-driver
nvidia-docker-plugin | 2017/01/16 18:36:59 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/01/16 18:36:59 Loading NVIDIA management library
nvidia-docker-plugin | 2017/01/16 18:37:00 Discovering GPU devices
nvidia-docker-plugin | 2017/01/16 18:37:00 Provisioning volumes at /usr/local/nvidia-driver
nvidia-docker-plugin | 2017/01/16 18:37:01 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/01/16 18:37:01 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/01/16 18:37:38 Received activate request
nvidia-docker-plugin | 2017/01/16 18:37:38 Plugins activated [VolumeDriver]
nvidia-docker-plugin | 2017/01/16 18:37:40 Received mount request for volume 'nvidia_driver_375.26'
nvidia-docker-plugin | 2017/01/16 18:37:41 Received unmount request for volume 'nvidia_driver_375.26'
nvidia-docker-plugin | 2017/01/16 18:37:41 Received mount request for volume 'nvidia_driver_375.26'

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

server3

@qiaohaijun
Copy link

systemctl edit nvidia-docker
[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-docker-plugin -s $SOCK_DIR -d /usr/local/nvidia-driver

and

mkdir /usr/local/nvidia-driver
chown -hR nvidia-docker /usr/local/nvidia-driver
chgrp nvidia-docker /usr/local/nvidia-driver

by https://github.com/Mr-Grieves

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants