Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA ICD is being skipped. #32

Open
stolk opened this issue Sep 13, 2023 · 1 comment
Open

NVIDIA ICD is being skipped. #32

stolk opened this issue Sep 13, 2023 · 1 comment

Comments

@stolk
Copy link

stolk commented Sep 13, 2023

I have a machine with both an NVIDIA dGPU and an AMD iGPU, as can be seen here:

$ inxi -G
Graphics:
  Device-1: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nvidia
    v: 535.54.03
  Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
    driver: amdgpu v: kernel
...

And here:

$ drm_info | grep Driver:
├───Driver: amdgpu (AMD GPU) version 3.52.0 (20150101)
├───Driver: nvidia-drm (NVIDIA DRM driver) version 0.0.0 (20160202)

(The vulkaninfo --summary will also list both.)

But when libOpenCL.so is querying the platforms, it seems to try 3 (as strace will open libnvidia-opencl.so) but will not report the nvidia ICD:

$ clinfo -l
Platform #0: Clover
 `-- Device #0: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.0-7-generic)
Platform #1: rusticl

Yet, the ICD is there, and points to a valid .so file:

$ ls -al /etc/OpenCL/vendors/
total 20
drwxr-xr-x 2 root root 4096 Sep 13 11:15 .
drwxr-xr-x 3 root root 4096 Sep  8 10:46 ..
-rw-r--r-- 1 root root   19 Jun  9 02:53 mesa.icd
-rw-r--r-- 1 root root   22 Jul 14 13:18 nvidia.icd
-rw-r--r-- 1 root root   22 Jun  9 02:53 rusticl.icd
$ cat /etc/OpenCL/vendors/nvidia.icd 
libnvidia-opencl.so.1
$ ldd /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 
	linux-vdso.so.1 (0x00007ffefe0f5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff434fc5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff433400000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff434fc0000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff434fbb000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff434fb6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff4350c4000)

Why is the nvidia opencl driver disregarded?

Here is the output from clinfo with OCL_ICD_DEBUG set to 4:

$ clinfo -l
ocl-icd(../ocl_icd_loader.c:201): _find_num_icds: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 2/0x2
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:287): _open_drivers: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331709632/0x7f5e256f60c0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331637104/0x7f5e256e4570
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331637392/0x7f5e256e4690
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299277520/0x7f5e238080d0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299286992/0x7f5e2380a5d0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299277616/0x7f5e23808130
ocl-icd(../ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_khr_il_program
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: MESA
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 3.0 
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: rusticl
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Mesa/X.org
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064915728/0x7f5e15886d10
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064912448/0x7f5e15886040
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064915712/0x7f5e15886d00
ocl-icd(../ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: MESA
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 1.1 Mesa 23.1.7-1ubuntu1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Clover
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Mesa
ocl-icd(../ocl_icd_loader.c:1134): clGetPlatformIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff
Platform #0: Clover
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
 `-- Device #0: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.0-7-generic)
Platform #1: rusticl

OS: Ubuntu 23.10

GPUS: NVIDIA + RADEON

@stolk
Copy link
Author

stolk commented Sep 13, 2023

Running with OCL_ICD_DEBUG=7 gives me more info:

Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped

I need to investigate why this symbol is missing.
I believe this to be working under Ubuntu 23.04 but not under Ubuntu 23.10 somehow.

UPDATE: From what I can tell so far: this is a bug in nvidia's 535.54.03 driver, which I think may have been solved in 535.86.05 driver. But somehow Ubuntu 23.10 lags Ubuntu 23.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant