[CI][DOCKER] Fix cuda11 nvidia-docker support for non-Tesla gpus #8163

tqchen · 2021-05-29T19:41:25Z

Starting cuda11, libcuda can be linked to a version of libcuda in
/usr/local/cuda/compact. The particular linked library
does not work for non-Tesla GPUs, causing "no CUDA capable devices found"
even though nvidia-smi shows available GPUs.

This PR makes makes sure we always prioritize linking
/usr/lib/x86_64-linux-gnu/libcuda.so.1

so the nvidia docker cuda11 images works for non-Tesla GPU envs.

Starting cuda11, libcuda can be linked to a version of libcuda in /usr/local/cuda/compact. The particular linked library does not work for non-Tesla GPUs, causing "no CUDA capable devices found" even though nvidia-smi shows available GPUs. This PR makes makes sure we always prioritize linking /usr/lib/x86_64-linux-gnu/libcuda.so.1 so the nvidia docker cuda11 images works for non-Tesla GPU envs.

tqchen · 2021-05-29T19:42:28Z

cc @areusch @tkonolige @junrushao1994

This is likely the root cause to your previous problem of "no CUDA capable devices found" when updating the cuda image.

…che#8163) Starting cuda11, libcuda can be linked to a version of libcuda in /usr/local/cuda/compact. The particular linked library does not work for non-Tesla GPUs, causing "no CUDA capable devices found" even though nvidia-smi shows available GPUs. This PR makes makes sure we always prioritize linking /usr/lib/x86_64-linux-gnu/libcuda.so.1 so the nvidia docker cuda11 images works for non-Tesla GPU envs.

tqchen force-pushed the ci branch 2 times, most recently from d8336ec to 9912bc4 Compare May 30, 2021 13:23

tqchen mentioned this pull request May 30, 2021

[CI] Hotfix the CI after image update #8164

Merged

junrushao approved these changes May 30, 2021

View reviewed changes

masahi merged commit 713de0c into apache:main May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI][DOCKER] Fix cuda11 nvidia-docker support for non-Tesla gpus #8163

[CI][DOCKER] Fix cuda11 nvidia-docker support for non-Tesla gpus #8163

tqchen commented May 29, 2021

tqchen commented May 29, 2021

[CI][DOCKER] Fix cuda11 nvidia-docker support for non-Tesla gpus #8163

[CI][DOCKER] Fix cuda11 nvidia-docker support for non-Tesla gpus #8163

Conversation

tqchen commented May 29, 2021

tqchen commented May 29, 2021