-
Notifications
You must be signed in to change notification settings - Fork 2k
cmake fails unable to find cuda library while building an image #1033
Comments
To use nvidia runtime with docker build you need to make it the default runtime. Just put |
Yes, the library wont be present during the build time unless you mount it inside the container. You can either do a docker run --gpus and do the rest of the build inside the container and then do a docker commit. Or use a -v option to manually mount it. Hope this helps. Closing now. |
Ok, this makes sense. I tried adding the line to suggest to daemon.json, but docker will not start with this modified config file. With the latest nvidia-docker working with Docker 19.03.1, the nvidia runtime doesn't appear to be registered (i.e. dockerd --default-runtime=nvidia returns specified default runtime 'nvidia' does not exist). I am cautious to rely on the documentation in the wiki given that it now spans three nvidia-docker versions. Is it necessary and are there updated instructions for registering the the nividia runtime with the latest nvidia-docker? I suppose editing daemon.json as described may no longer be the accepted method for configuring the default runtime during docker build.
Can you give any more details about where to find the appropriate library to mount and to compile against? Since the beauty of nividia-docker is that it is host driver agnostic to some extent, it seems to me that that CUDA libraries I mount should correspond to the CUDA version in the specific nividia-docker image I have selected. Perhaps I am wrong.
This seems to deviate wildly from best docker practices. I know it will work, but I would love to get docker build and a Dockerfile working properly for my use. That the CUDA libraries are not mounted during build seems like a problem to me. |
You have to install the {
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
} |
That's excellent, it worked nicely. Thank you very much for your help. |
For the life of me, I cannot get this approach to work. Is there something that overrides the default runtime? Is there a way to debug which runtime is getting used? |
I didn't know I should restart the deamon for changes to take place After modifying the deamon.json, |
1. Issue or feature description
I have created a Dockerfile to containerize some medical image processing code. With NVIDIA-docker2 I was able to use the file to generate an image without issue. When I attempt to build that image on a different machine with the latest Docker (19.03) and the latest NVIDIA-docker, it fails on step 8/8 when cmake cannot find the CUDA_CUDA_LIBRARY. When I run the step 7/8 image in the bash shell I can copy and paste the cmake command (line 86 of the Dockerfile) that failed during build, and it configures and then compiles fine in the image. My conception of Docker containers and images is being strained by this issue. I don't understand why a RUN command could fail while the same command run in the image would work.
Debugging a bit in the image I initialize with "docker run --gpus all -it /bin/bash", cmake finds CUDA_CUDA_LIBRARY in the image at /usr/lib/x86_64-linux-gnu/libcuda.so, but when I hard code that location in the Dockerfile cmake command (i.e. I use the commented line 87 in the Dockerfile I link above) cmake gives the error "No rule to make target '/usr/lib/x86_64-linux-gnu/libcuda.so', needed by '../bin/SVRreconstructionGPU'." which makes me believe that library actually doesn't exist in the "build image".
2. Steps to reproduce the issue
git clone git@github.com:dittothat/dockerfetalrecon.git
cd dockerfetalrecon
docker build -t fetalrecon .
This will fail when cmake cannot find the cuda libraries needed to compile.
Comment out line 86 and uncomment line 87 in the Dockerfile
docker build -t fetalrecon .
This will fail when the library really cannot be found.
Now initialize the image created by step 7/8 in build:
docker run --gpus all -it fetalrecon /bin/bash
Then in the container:
cd /usr/src/fetalReconstruction/source/build
cmake -DCUDA_SDK_ROOT_DIR:PATH=/usr/local/cuda-9.1/samples ..
make
Everything compiles just fine (though sometimes I must run make again to fix the linking error with niftiio toward the end, still trying to figure out what is going on there).
3. Information to attach (optional if deemed irrelevant)
Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
container_information.log
Kernel version from
uname -a
Linux titan 5.0.0-21-generic Release file has an invalid format #22+system76-Ubuntu SMP Tue Jul 16 19:57:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Any relevant kernel output lines from
dmesg
Driver information from
nvidia-smi -a
driver_information.log
Docker version from
docker version
docker_version.log
NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
NVIDIA_pacakges_ver.log
NVIDIA container library version from
nvidia-container-cli -V
NVIDIA_container_lib_ver.log
NVIDIA container library logs (see troubleshooting)
Docker command, image and tag used
The text was updated successfully, but these errors were encountered: