Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU not detected inside container (NVML "Driver/library version mismatch" error) #929

Open
jaimeehh opened this issue Feb 16, 2025 · 0 comments

Comments

@jaimeehh
Copy link

Hello,

I am facing an issue where my host machine detects the GPU correctly, but inside the Docker container, I get the following error when running nvidia-smi:

Failed to initialize NVML: Driver/library version mismatch

I have tried multiple configurations and different CUDA base images, but I can't resolve this issue. I believe the problem is related to library version conflicts between the host and the container.

Host System (Outside Docker)

  • OS: Ubuntu 22
  • GPU: NVIDIA GeForce GTX TITAN Black (GK110B, Compute Capability 3.5)
  • Driver: 470.256.02
  • CUDA Version (from nvidia-smi): 11.4 (Host driver supports up to CUDA 12)
  • Docker Version: 27.3.1
  • NVIDIA Container Toolkit Installed: Yes, version 1.17.4-1

Container Configuration

  • Base Image Used: (I have tried multiple)
    • nvidia/cuda:10.2-runtime-ubuntu18.04
    • nvidia/cuda:10.2-base-ubuntu18.04
    • nvidia/cuda:10.2-runtime
  • Container OS: Ubuntu 18.04
  • CUDA Version inside container: 10.2
  • NVIDIA Container Toolkit Installed: Yes
  • Run command:
    sudo docker run --gpus all -it --name my_container -v /home/user/my_project:/workspace my_niftypet_runtime

Debugging Attempts

  1. Verified NVIDIA Container Toolkit is installed on the host

    dpkg -l | grep nvidia-container

    Output:

    ii  libnvidia-container-tools                  1.17.4-1
    ii  libnvidia-container1:amd64                 1.17.4-1
    ii  nvidia-container-toolkit                   1.17.4-1
    ii  nvidia-container-toolkit-base              1.17.4-1
    
  2. Tried forcing the container to use host libraries:

    • Running the container with:
      sudo docker run --gpus all -it --env LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu my_niftypet_runtime
    • Still getting the NVML error.
  3. Tried other images:

    • I attempted using nvidia/cuda:10.2.89-base-ubuntu18.04, but it seems unavailable on Docker Hub.

Questions & Help Needed

  • How can I ensure that the container correctly uses the host’s NVIDIA libraries to avoid the Driver/library version mismatch error?
  • Is there any specific Docker image or configuration recommended for older GPUs like the GTX TITAN Black that require CUDA 10.2?
  • Could my Docker version (27.3.1) or NVIDIA Container Toolkit version (1.17.4-1) be incompatible with my setup?

This issue is blocking my work¡. Any help would be greatly appreciated!

Thank you in advance! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant