Link Issue on volume creation #133

guilhermehartmann · 2016-07-08T01:16:23Z

While creating the volume I get this issue, seems to be caused by hard link crossing two different partitions. Is this a know issue ?

sudo nvidia-docker volume setupnvidia-docker-plugin | 2016/07/08 02:12:39 Received remove request for volume 'nvidia_driver_367.27'
nvidia-docker run --rm nvidia/cuda nvidia-sminvidia-docker-plugin | 2016/07/08 02:12:52 Received create request for volume 'nvidia_driver_367.27'
nvidia-docker-plugin | 2016/07/08 02:12:52 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/367.27/bin/nvidia-cuda-mps-control: invalid cross-device link

flx42 · 2016-07-08T01:24:06Z

First of all, you shouldn't use volume setup, we removed this command from our latest version. You should use nvidia-docker-plugin (started automatically if you install nvidia-docker using the deb or the rpm).

And yes, it's a known limitation:
https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin#known-limitations
You can use the -d option of nvidia-docker-plugin to change the path for the volume.

guilhermehartmann · 2016-07-10T14:12:47Z

Thanks, I missed the limitations bit. Opted to use /usr/local/nvidia-docker as the default volume

dpatschke · 2016-07-21T01:20:22Z

I am experiencing this problem as well because I have '/var' on a separate partition from '/usr' where the nvidia drivers are located. I would like to switch the default volume location to a folder in '/usr' as the workaround suggests. However, I cannot, for the life of me, figure out how to accomplish this using nvidia-driver-plugin -d.

I am running:
sudo nvidia-docker-plugin -d /usr/local/nvidia-driver

and the change appears to be taking place,

nvidia-docker-plugin | 2016/07/20 20:00:49 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/07/20 20:00:49 Loading NVIDIA management library
nvidia-docker-plugin | 2016/07/20 20:00:49 Discovering GPU devices
nvidia-docker-plugin | 2016/07/20 20:00:50 Provisioning volumes at /usr/local/nvidia-driver
nvidia-docker-plugin | 2016/07/20 20:00:50 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/07/20 20:00:50 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/07/20 20:00:50 Error: listen tcp 127.0.0.1:3476: bind: address already in use

but then I run this:

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

and I still get this error

docker: Error response from daemon: create nvidia_driver_367.35: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

@flx42, would you be able to point me in the right direction? or, @guilhermehartmann, how were you able to use /usr/local/nvidia-driver as your default volume?

Apologies as I am new to Docker and it seems I have jumped into the deep end of the pool :-).

Thanks!

flx42 · 2016-07-21T01:43:44Z

@dpatschke Look at your log after running nvidia-docker-plugin -d [...] it failed:

nvidia-docker-plugin | 2016/07/20 20:00:50 Error: listen tcp 127.0.0.1:3476: bind: address already in use

This is because the nvidia-docker service is still running, so you're still using the other version of the plugin, without the -d. You should try to modify your service configuration file directly, which OS are you on?

dpatschke · 2016-07-21T02:00:48Z

Thank you for your response, @flx42.

I am running Ubuntu 16.04. I would love to be able to modify some configuration file and restart docker or the nvidia-docker-plugin or whatever, but have been scouring the web and message boards for hours and can't seem to find what I am looking for.

Would you be able to point me to the correct config file to modify? Also, I have no idea how nvidia-docker-plugin is running in the first place. Is the plugin launched when the docker service is started? How do I stop the current plugin and 'restart' one with the 'd' option?

Thank you very much for your help!!

David

flx42 · 2016-07-21T02:13:44Z

Something like that

# systemctl edit nvidia-docker

[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-docker-plugin -s $SOCK_DIR -d /usr/local/nvidia-driver

dpatschke · 2016-07-21T04:13:21Z

Thank you @flx42 ... unfortunately, I could not get the problem resolved.

I executed the 'edit' command as you suggested and created the file with what you had listed. The 'nano' editor wanted to save it as 'override.conf' with a bunch of additional characters at the end.

I ended up saving the file as /etc/systemd/system/nvidia-docker.service.d/override.conf.

I then restarted the systemd service:
sudo systemctl restart nvidia-docker

I am still getting the old folder when I issue the command:
sudo nvidia-docker-plugin

Here is the output:

nvidia-docker-plugin | 2016/07/20 23:02:53 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/07/20 23:02:53 Loading NVIDIA management library
nvidia-docker-plugin | 2016/07/20 23:02:53 Discovering GPU devices
nvidia-docker-plugin | 2016/07/20 23:02:53 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/07/20 23:02:53 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/07/20 23:02:53 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/07/20 23:02:53 Error: listen tcp 127.0.0.1:3476: bind: address already in use

When I issue the command:
sudo systemctl edit nvidia-docker
I am seeing the new file I created.

Now, when I issue the following command, though:
sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

I get the following error:
docker: Error response from daemon: create nvidia_driver_367.35: create nvidia_driver_367.35: Error looking up volume plugin nvidia-docker: plugin not found.

flx42 · 2016-07-21T04:29:31Z

Don't try to start nvidia-docker-plugin manually, it's handled by systemd.
Try to restart the docker service too.

dpatschke · 2016-07-21T04:56:17Z

Yeah ... did a a sudo service docker restart and still getting the same result - 'plugin not found'. Restarted entire system ... same result.

When I do a sudo nvidia-docker volume ls it is completely empty. I seem to remember reading somewhere that there should be something present.

I am also stlll getting the 'address already in use' error as well.

I don't know where things went wrong but any other suggestions/recommendations would be greatly appreciated.

David

flx42 · 2016-07-21T04:59:33Z

@dpatschke: give me the output of

journalctl -n -u nvidia-docker

dpatschke · 2016-07-21T05:11:15Z

Looks like I didn't have the nvidia-docker service started last time. Started it up again, but was still erroring out.

Here is the output from your recommended command:

 Jul 21 00:06:04 Precision-Tower-7910 systemd[1]: Starting NVIDIA Docker plugin...
Jul 21 00:06:04 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:04 Loading NVIDIA unified memory
Jul 21 00:06:04 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:04 Loading NVIDIA management library
Jul 21 00:06:04 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:04 Discovering GPU devices
Jul 21 00:06:04 Precision-Tower-7910 systemd[1]: Started NVIDIA Docker plugin.
Jul 21 00:06:05 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:05 Provisioning volumes at /usr/local/nvidia-driver
Jul 21 00:06:05 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:05 Serving plugin API at /var/lib/nvidia-docker
Jul 21 00:06:05 Precision-Tower-7910 nvidia-docker-plugin[4360]: /usr/bin/nvidia-docker-plugin | 2016/07/21 00:06:05 Serving remote API at localhost:3476

This looks good, doesn't it? Still getting this error, though, when actually trying to launch nvidia-docker:
docker: Error response from daemon: create nvidia_driver_367.35: VolumeDriver.Create: internal error, check logs for details.

flx42 · 2016-07-21T05:17:14Z

@dpatschke yes it looks good.
At this point, I would advise you to simply purge nvidia-docker the hard way:

apt-get purge nvidia-docker
rm -rf /var/lib/nvidia-docker

Then restart docker, reinstall nvidia-docker from the deb, edit the systemd configuration file again, reboot.

If you still have the problem after that, please file a new bug with the new output of journalctl -n -u nvidia-docker.

dpatschke · 2016-07-21T05:20:22Z

OK, will do it again ... thank you so much for your help and guidance!

Mr-Grieves · 2017-02-21T23:17:26Z

Not sure if anyone will find this useful, but there was the one last step I had to do to get this working:

Ensure that the directory specified by the -d in the systemd config file exists and is owned by nvidia-docker:

mkdir /usr/local/nvidia-driver
chown -hR nvidia-docker /usr/local/nvidia-driver
chgrp nvidia-docker /usr/local/nvidia-driver

qiaohaijun · 2017-03-08T09:26:47Z

this solution help me a lot.

I use centos7.2 with k40c x 4

flx42 added the work as intended label Jul 9, 2016

flx42 closed this as completed Jul 9, 2016

flx42 mentioned this issue Jul 19, 2016

Trying to run nvidia-docker and saw error: VolumeDriver.Create: internal error, check logs for details #142

Closed

dpatschke mentioned this issue Jul 21, 2016

VolumeDriver.Create: internal error on Ubuntu 16.04 #148

Closed

3XX0 mentioned this issue Aug 17, 2016

nvidia-docker-plugin can't start a container of caffe #173

Closed

flx42 mentioned this issue Aug 26, 2016

driver 367: invalid cross-device link #184

Closed

3XX0 mentioned this issue Sep 4, 2016

VolumeDriver.Create: internal error #188

Closed

jsmith50500 mentioned this issue Jan 16, 2017

Upgraded nvidia drivers from 361.42 to 375.26 to run cuda 8 docker images nvidia-docker hit error saying docker: Error response from daemon: create nvidia_driver_375.26: VolumeDriver.Create: internal error #290

Closed

pejvan mentioned this issue Jan 29, 2017

Link Issue on volume creation on Fedora 25 #300

Closed

yaohungt mentioned this issue Mar 8, 2017

Missing nvidia_driver_375.26 #334

Closed

aTnT mentioned this issue Mar 9, 2017

Invalid cross-device link: how to reinstall NVIDIA drivers in a specific location? #335

Closed

xzhang311 mentioned this issue Jul 10, 2017

Invalid cross-device link in Ubuntu 14.04 #425

Closed

jidong029 mentioned this issue Sep 22, 2017

ubuntu 14.04 nvidia-docker run --rm nvidia/cuda nvidia-smi Error response from daemon: create nvidia_driver_375.66: VolumeDriver.Create: internal ----->>>>>> error： Link Issue on volume creation #475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link Issue on volume creation #133

Link Issue on volume creation #133

guilhermehartmann commented Jul 8, 2016

flx42 commented Jul 8, 2016 •

edited

Loading

guilhermehartmann commented Jul 10, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016 •

edited

Loading

flx42 commented Jul 21, 2016 •

edited

Loading

dpatschke commented Jul 21, 2016 •

edited

Loading

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

Mr-Grieves commented Feb 21, 2017

qiaohaijun commented Mar 8, 2017

Link Issue on volume creation #133

Link Issue on volume creation #133

Comments

guilhermehartmann commented Jul 8, 2016

flx42 commented Jul 8, 2016 • edited Loading

guilhermehartmann commented Jul 10, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016 • edited Loading

flx42 commented Jul 21, 2016 • edited Loading

dpatschke commented Jul 21, 2016 • edited Loading

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

flx42 commented Jul 21, 2016

dpatschke commented Jul 21, 2016

Mr-Grieves commented Feb 21, 2017

qiaohaijun commented Mar 8, 2017

flx42 commented Jul 8, 2016 •

edited

Loading

dpatschke commented Jul 21, 2016 •

edited

Loading

flx42 commented Jul 21, 2016 •

edited

Loading

dpatschke commented Jul 21, 2016 •

edited

Loading