Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Link Issue on volume creation on Fedora 25 #300

Closed
pejvan opened this issue Jan 29, 2017 · 8 comments
Closed

Link Issue on volume creation on Fedora 25 #300

pejvan opened this issue Jan 29, 2017 · 8 comments

Comments

@pejvan
Copy link

pejvan commented Jan 29, 2017

I am unable to run the example given on the front page:

$nvidia-docker run --rm nvidia/cuda nvidia-smi
/usr/bin/docker-current: Error response from daemon: create nvidia_driver_375.26: VolumeDriver.Create: internal error

Looking into the logs, I see:

$journalctl -n -u nvidia-docker
[...]
Jan 29 17:02:15 localhost.localdomain nvidia-docker-plugin[1821]: /usr/bin/nvidia-docker-plugin | 2017/01/29 17:02:15 Error: link /usr/lib64/libnvidia-tls.so.375.26 /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/lib64/libnvidia-tls.so.375.26: file exists

However:

$nvidia-docker volume ls
DRIVER              VOLUME NAME

I've tried to clean up whatever could cause the problem:

$ ls -l /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/lib64/libnvidia-tls.so.375.26
ls: cannot access '/var/lib/nvidia-docker/volumes/nvidia_driver/375.26/lib64/libnvidia-tls.so.375.26': No such file or directory
$ ls -l /var/lib/nvidia-docker/volumes/nvidia_driver/375.26/lib64/
ls: cannot access '/var/lib/nvidia-docker/volumes/nvidia_driver/375.26/lib64/': No such file or directory
$ ls -l /var/lib/nvidia-docker/volumes/nvidia_driver/375.26
ls: cannot access '/var/lib/nvidia-docker/volumes/nvidia_driver/375.26': No such file or directory
$ ls -l /var/lib/nvidia-docker/volumes/nvidia_driver
total 0
$ sudo rmdir  /var/lib/nvidia-docker/volumes/nvidia_driver
[sudo] password for pejvan:
$ ls -l /var/lib/nvidia-docker/volumes/
total 0
$ ls -l /var/lib/nvidia-docker/
total 4
srwxr-xr-x. 1 nvidia-docker nvidia-docker    0 Jan 29 17:23 nvidia-docker.sock
drwxr-xr-x. 2 nvidia-docker nvidia-docker 4096 Jan 29 17:25 volumes
$nvidia-smi
Sun Jan 29 17:22:24 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 0000:04:00.0      On |                  N/A |
|  0%   34C    P8     7W / 120W |    287MiB /  6070MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2404    G   /usr/libexec/Xorg                              178MiB |
|    0      2433    G   /usr/bin/gnome-shell                           106MiB |
+-----------------------------------------------------------------------------+

While the error sounds like #133 I don't have several partitions:

$ lsblk
NAME                                                    MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT
loop1                                                     7:1    0     2G  0 loop
└─docker-253:1-2632699-pool                             253:5    0   100G  0 dm
sdb                                                       8:16   0 256.2G  0 disk
├─sdb2                                                    8:18   0     1G  0 part   /boot
├─sdb3                                                    8:19   0   255G  0 part
│ └─luks-f2cf06bb-80bf-4a6f-bee4-6e8773ac87d8           253:0    0   255G  0 crypt
│   ├─fedora-root                                       253:1    0    50G  0 lvm    /
│   ├─fedora-home                                       253:4    0 193.2G  0 lvm    /home
│   └─fedora-swap                                       253:2    0  11.8G  0 lvm    [SWAP]
└─sdb1                                                    8:17   0   200M  0 part   /boot/efi
loop0                                                     7:0    0   100G  0 loop
└─docker-253:1-2632699-pool                             253:5    0   100G  0 dm
sda                                                       8:0    0 931.5G  0 disk
└─ddf1_44656c6c202020201000006010281f0c3e2eb6117b1d3342 253:3    0   931G  0 dmraid

On #188 some users managed to purge and reinstall and get it to work, but I am not on Ubuntu, and trying the same doesn't seem to solve the problem.

I reached the point where I have tried countless variations of uninstall/reboot/reinstall and manually fix permissions or remove files, but it still doesn't work...

Not sure what to do from here, and would appreciate any help.

@flx42
Copy link
Member

flx42 commented Jan 30, 2017

Hello,

It's possible that you have leftover files from a previous driver installation, and this could mean you have duplicate libraries. What's the output of the following command?

$ ld config -p | grep nvidia

@pejvan
Copy link
Author

pejvan commented Feb 1, 2017

Hello @flx42 -- sorry I don't have access to this computer, I'll get back to you by Friday.

@pejvan
Copy link
Author

pejvan commented Feb 3, 2017

Hi @flx42

Here's the output of the command:

$ ldconfig -p | grep nvidia
	libvdpau_nvidia.so (libc6,x86-64, OS ABI: Linux 2.3.99) => /lib64/libvdpau_nvidia.so
	libvdpau_nvidia.so (libc6, OS ABI: Linux 2.3.99) => /lib/libvdpau_nvidia.so
	libnvidia-tls.so.375.26 (libc6,x86-64, hwcap: 0x8000000000000000, OS ABI: Linux 2.3.99) => /lib64/tls/libnvidia-tls.so.375.26
	libnvidia-tls.so.375.26 (libc6,x86-64, OS ABI: Linux 2.3.99) => /lib64/libnvidia-tls.so.375.26
	libnvidia-tls.so.375.26 (libc6, OS ABI: Linux 2.2.5) => /lib/libnvidia-tls.so.375.26
	libnvidia-ptxjitcompiler.so.375.26 (libc6,x86-64) => /lib64/libnvidia-ptxjitcompiler.so.375.26
	libnvidia-ptxjitcompiler.so.375.26 (libc6) => /lib/libnvidia-ptxjitcompiler.so.375.26
	libnvidia-opencl.so.1 (libc6,x86-64) => /lib64/libnvidia-opencl.so.1
	libnvidia-opencl.so.1 (libc6) => /lib/libnvidia-opencl.so.1
	libnvidia-ml.so.1 (libc6,x86-64) => /lib64/libnvidia-ml.so.1
	libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
	libnvidia-ml.so (libc6,x86-64) => /lib64/libnvidia-ml.so
	libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
	libnvidia-ifr.so.1 (libc6,x86-64) => /lib64/libnvidia-ifr.so.1
	libnvidia-ifr.so.1 (libc6) => /lib/libnvidia-ifr.so.1
	libnvidia-ifr.so (libc6,x86-64) => /lib64/libnvidia-ifr.so
	libnvidia-ifr.so (libc6) => /lib/libnvidia-ifr.so
	libnvidia-gtk3.so.375.26 (libc6,x86-64) => /lib64/libnvidia-gtk3.so.375.26
	libnvidia-gtk2.so.375.26 (libc6,x86-64) => /lib64/libnvidia-gtk2.so.375.26
	libnvidia-glsi.so.375.26 (libc6,x86-64) => /lib64/libnvidia-glsi.so.375.26
	libnvidia-glsi.so.375.26 (libc6) => /lib/libnvidia-glsi.so.375.26
	libnvidia-glcore.so.375.26 (libc6,x86-64) => /lib64/libnvidia-glcore.so.375.26
	libnvidia-glcore.so.375.26 (libc6) => /lib/libnvidia-glcore.so.375.26
	libnvidia-fbc.so.1 (libc6,x86-64) => /lib64/libnvidia-fbc.so.1
	libnvidia-fbc.so.1 (libc6) => /lib/libnvidia-fbc.so.1
	libnvidia-fbc.so (libc6,x86-64) => /lib64/libnvidia-fbc.so
	libnvidia-fbc.so (libc6) => /lib/libnvidia-fbc.so
	libnvidia-fatbinaryloader.so.375.26 (libc6,x86-64) => /lib64/libnvidia-fatbinaryloader.so.375.26
	libnvidia-fatbinaryloader.so.375.26 (libc6) => /lib/libnvidia-fatbinaryloader.so.375.26
	libnvidia-encode.so.1 (libc6,x86-64) => /lib64/libnvidia-encode.so.1
	libnvidia-encode.so.1 (libc6) => /lib/libnvidia-encode.so.1
	libnvidia-encode.so (libc6,x86-64) => /lib64/libnvidia-encode.so
	libnvidia-encode.so (libc6) => /lib/libnvidia-encode.so
	libnvidia-eglcore.so.375.26 (libc6,x86-64) => /lib64/libnvidia-eglcore.so.375.26
	libnvidia-eglcore.so.375.26 (libc6) => /lib/libnvidia-eglcore.so.375.26
	libnvidia-egl-wayland.so.375.26 (libc6,x86-64) => /lib64/libnvidia-egl-wayland.so.375.26
	libnvidia-compiler.so.375.26 (libc6,x86-64) => /lib64/libnvidia-compiler.so.375.26
	libnvidia-compiler.so.375.26 (libc6) => /lib/libnvidia-compiler.so.375.26
	libnvidia-cfg.so.1 (libc6,x86-64) => /lib64/libnvidia-cfg.so.1
	libnvidia-cfg.so (libc6,x86-64) => /lib64/libnvidia-cfg.so
	libOpenCL.so.1 (libc6,x86-64) => /usr/lib64/nvidia/libOpenCL.so.1
	libGLX_nvidia.so.0 (libc6,x86-64) => /lib64/libGLX_nvidia.so.0
	libGLX_nvidia.so.0 (libc6) => /lib/libGLX_nvidia.so.0
	libGLESv2_nvidia.so.2 (libc6,x86-64) => /lib64/libGLESv2_nvidia.so.2
	libGLESv2_nvidia.so.2 (libc6) => /lib/libGLESv2_nvidia.so.2
	libGLESv1_CM_nvidia.so.1 (libc6,x86-64) => /lib64/libGLESv1_CM_nvidia.so.1
	libGLESv1_CM_nvidia.so.1 (libc6) => /lib/libGLESv1_CM_nvidia.so.1
	libEGL_nvidia.so.0 (libc6,x86-64) => /lib64/libEGL_nvidia.so.0
	libEGL_nvidia.so.0 (libc6) => /lib/libEGL_nvidia.so.0

Can you see anything wrong?

@3XX0
Copy link
Member

3XX0 commented Feb 3, 2017

I don't know how you installed the drivers but this is not supposed to happen:

libnvidia-tls.so.375.26 (libc6,x86-64, hwcap: 0x8000000000000000, OS ABI: Linux 2.3.99) => /lib64/tls/libnvidia-tls.so.375.26
libnvidia-tls.so.375.26 (libc6,x86-64, OS ABI: Linux 2.3.99) => /lib64/libnvidia-tls.so.375.26

The one under /lib64 should have the old ABI (i.e. 2.2.5)

@pejvan
Copy link
Author

pejvan commented Feb 12, 2017

Initially, I installed the tar.gz on my Fedora25, then since it wasn't working, I tried to uninstall it, and reinstall the Redhat package, but it didn't work neither...

I just successfully got it working on CentOS 7.3

So, it seems like it would either to just either to clear and start fresh... so I'll close this ticket and apologies for wasting your time with my messed up system.

@pejvan pejvan closed this as completed Feb 12, 2017
@kwizart
Copy link

kwizart commented Jul 10, 2017

@3XX0
old tls ABI isn't used at all in any maintained distro anywhere, so as re-packager of the nvidia driver, we have dropped support of it and only provide the new abi tls.

@3XX0
Copy link
Member

3XX0 commented Jul 10, 2017

@kwizart Sure but it doesn't change the fact that your driver isn't installed properly. You have two tls libraries with the same ABI and one of them is missing the tls hwcap. The second one shouldn't be there, so you probably have a packaging issue on your platform.

@kwizart
Copy link

kwizart commented Jul 10, 2017

thx, for your confirmation. I don't reproduce with the driver repackaged from rpmfusion.org

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants