-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmcv-full not compiled when building inside docker #1154
Comments
hi @lingcong-k , could you try to print |
Hi thanks for reply. On my own machine which is used to build the docker file, echo $CUDA_HOME return empty |
but nvidia-smi gives cuda version 11.2 |
i have multiple cuda versons installed in my pc where i build the docker.. do u mean that i need to make my cuda home 11.1 before i build the docker, then it ll be alright? @zhouzaida I assume its the base docker image "nvcr.io/nvidia/pytorch:20.11-py3" which defines the cuda version inside the docker container tho |
Launch the image by git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .
pytest tests/test_ops/test_nms.py |
you could try the command |
@zhouzaida actually u provided a really good debugging approach to check the nms however I notice sth realllly weird..So if I do
and launch the docker image and do
It failed saying : RuntimeError: nms is not compiled with GPU support BUT if I then inside the docker image and manually do:
It then uninstalled the one I installed when building docker and install again. and then no error, nms test pass.SO there seems some bug or sth to do when install mmcv-full inside Docker (no error threw when building it tho) . Manually do it will always work no matter inside docker or on local machine.But in my case, I need to build it successfully inside the Docker coz my training pipeline launch and autoscale my training automatically. the log showing that building in docker failed but manually did it inside the dockerimage worked
|
please provide your command for building image the command should be |
i build with "DOCKER_BUILDKIT=1 docker build **********' So its a must to have --runtime-nvidia ? I try to add this flag but it says unknown flag --runtime |
yet, maybe you could have a try. I think it will work |
@zhouzaida my default runtime setting in docker config is alreadu nvidia
thanks |
refer to https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime, maybe it is helpful |
@zhouzaida Thanks.. I found the issue. so if anybody else facing the same issue. check two things
|
in docker these commands worked for me WORKDIR mmcv RUN MMCV_WITH_OPS=1 pip install -e .` |
fwiw, I was able to resolve this (while still using buildkit) by adding the following to my dockerfile (before installing mmcv)
you can specify whatever compute capabiliies you want based on the hardware you are going to be running: |
I encountered the same problem. After some search and try, finally solve this issue by referring to to official dockerfile:https://github.com/open-mmlab/mmcv/blob/main/docker/dev/Dockerfile You can add these lines in your dockerfile:
|
Hey, just a quick update it you want to compile for latest architectures using docker build, use this to your Dockerfile
See all the latest arch here. |
Checklist
I know this error has been brought up several times
open-mmlab/mmdetection#2686
open-mmlab/mmdetection#4075
But Iv checked all solutions, all didnt work out for me.
I am building mmcv in docker
I am using this pytorch image: FROM nvcr.io/nvidia/pytorch:20.11-py3 (which has pytorch 1.8.0, cuda 11.1.0)
I tried this
and many more versions, both didnt work..
according to mmcv installation guide.. mmcv-full 1.3.8 should complied with pytorch 1.3.8 cuda 11.1.0.
isnt it?
I ran out of ideas.. stuck here for few days.. can someone please help me out.. thanks
The text was updated successfully, but these errors were encountered: