Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmcv-full not compiled when building inside docker #1154

Closed
lingcong-k opened this issue Jun 28, 2021 · 17 comments
Closed

mmcv-full not compiled when building inside docker #1154

lingcong-k opened this issue Jun 28, 2021 · 17 comments

Comments

@lingcong-k
Copy link

lingcong-k commented Jun 28, 2021

Checklist

I know this error has been brought up several times

open-mmlab/mmdetection#2686
open-mmlab/mmdetection#4075

But Iv checked all solutions, all didnt work out for me.

I am building mmcv in docker
I am using this pytorch image: FROM nvcr.io/nvidia/pytorch:20.11-py3 (which has pytorch 1.8.0, cuda 11.1.0)

I tried this

FROM nvcr.io/nvidia/pytorch:20.11-py3
........(omit other comands which are irrelevant)...........

RUN git clone https://github.com/open-mmlab/mmcv.git && \
cd mmcv && \
MMCV_WITH_OPS=1 pip install -e .
FROM nvcr.io/nvidia/pytorch:20.11-py3
pip install mmcv-full==1.3.8 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html

and many more versions, both didnt work..
according to mmcv installation guide.. mmcv-full 1.3.8 should complied with pytorch 1.3.8 cuda 11.1.0.
isnt it?

I ran out of ideas.. stuck here for few days.. can someone please help me out.. thanks

@zhouzaida
Copy link
Collaborator

hi @lingcong-k , could you try to print echo $CUDA_HOME?

@lingcong-k
Copy link
Author

echo $CUDA_HOME

Hi thanks for reply.
U mean in the docker image when i run it? coz am running it on clouds?

On my own machine which is used to build the docker file, echo $CUDA_HOME return empty

@lingcong-k
Copy link
Author

but nvidia-smi gives cuda version 11.2

@lingcong-k
Copy link
Author

lingcong-k commented Jun 29, 2021

i have multiple cuda versons installed in my pc where i build the docker.. do u mean that i need to make my cuda home 11.1 before i build the docker, then it ll be alright? @zhouzaida

I assume its the base docker image "nvcr.io/nvidia/pytorch:20.11-py3" which defines the cuda version inside the docker container tho

@zhouzaida
Copy link
Collaborator

zhouzaida commented Jul 10, 2021

Launch the image by docker run -it --runtime=nvidia nvcr.io/nvidia/pytorch:20.11-py3 and run follow commands

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .
pytest tests/test_ops/test_nms.py

@zhouzaida
Copy link
Collaborator

you could try the command docker run -it --runtime=nvidia nvcr.io/nvidia/pytorch:20.11-py3 to launch your image

@lingcong-k
Copy link
Author

lingcong-k commented Jul 12, 2021

you could try the command docker run -it --runtime=nvidia nvcr.io/nvidia/pytorch:20.11-py3 to launch your image

@zhouzaida actually u provided a really good debugging approach to check the nms

however I notice sth realllly weird..

So if I do

FROM nvcr.io/nvidia/pytorch:21.02-py3

RUN apt-get update
RUN apt install -y libgl1-mesa-glx   #this is for opencv import error when test nms.py

RUN git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .

and launch the docker image and do

pytest tests/test_ops/test_nms.py

It failed saying : RuntimeError: nms is not compiled with GPU support

BUT if I then inside the docker image and manually do:

cd mmcv
MMCV_WITH_OPS=1 pip install -e .

It then uninstalled the one I installed when building docker and install again. and then no error, nms test pass.

SO there seems some bug or sth to do when install mmcv-full inside Docker (no error threw when building it tho) . Manually do it will always work no matter inside docker or on local machine.

But in my case, I need to build it successfully inside the Docker coz my training pipeline launch and autoscale my training automatically.
What do you think? :) Thanks in advance

the log showing that building in docker failed but manually did it inside the dockerimage worked

root@bc530dbd64e2:/workspace# cd mmcv
root@bc530dbd64e2:/workspace/mmcv# pytest tests/test_ops/test_nms.py
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.8.5, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /workspace/mmcv
plugins: cov-2.11.1, pythonpath-0.7.3, hypothesis-4.50.8
collected 4 items                                                                                                                                                                  

tests/test_ops/test_nms.py F...                                                                                                                                              [100%]

===================================================================================== FAILURES =====================================================================================
____________________________________________________________________________ Testnms.test_nms_allclose _____________________________________________________________________________

self = <test_nms.Testnms object at 0x7ff0ae72dd00>

    def test_nms_allclose(self):
        if not torch.cuda.is_available():
            return
        from mmcv.ops import nms
        np_boxes = np.array([[6.0, 3.0, 8.0, 7.0], [3.0, 6.0, 9.0, 11.0],
                             [3.0, 7.0, 10.0, 12.0], [1.0, 4.0, 13.0, 7.0]],
                            dtype=np.float32)
        np_scores = np.array([0.6, 0.9, 0.7, 0.2], dtype=np.float32)
        np_inds = np.array([1, 0, 3])
        np_dets = np.array([[3.0, 6.0, 9.0, 11.0, 0.9],
                            [6.0, 3.0, 8.0, 7.0, 0.6],
                            [1.0, 4.0, 13.0, 7.0, 0.2]])
        boxes = torch.from_numpy(np_boxes)
        scores = torch.from_numpy(np_scores)
        dets, inds = nms(boxes, scores, iou_threshold=0.3, offset=0)
        assert np.allclose(dets, np_dets)  # test cpu
        assert np.allclose(inds, np_inds)  # test cpu
>       dets, inds = nms(
            boxes.cuda(), scores.cuda(), iou_threshold=0.3, offset=0)

tests/test_ops/test_nms.py:25: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mmcv/utils/misc.py:330: in new_func
    output = old_func(*args, **kwargs)
mmcv/ops/nms.py:171: in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ctx = <torch.autograd.function.NMSopBackward object at 0x7ff08d4e1d60>
bboxes = tensor([[ 6.,  3.,  8.,  7.],
        [ 3.,  6.,  9., 11.],
        [ 3.,  7., 10., 12.],
        [ 1.,  4., 13.,  7.]], device='cuda:0')
scores = tensor([0.6000, 0.9000, 0.7000, 0.2000], device='cuda:0'), iou_threshold = 0.3, offset = 0, score_threshold = 0, max_num = -1

    @staticmethod
    def forward(ctx, bboxes, scores, iou_threshold, offset, score_threshold,
                max_num):
        is_filtering_by_score = score_threshold > 0
        if is_filtering_by_score:
            valid_mask = scores > score_threshold
            bboxes, scores = bboxes[valid_mask], scores[valid_mask]
            valid_inds = torch.nonzero(
                valid_mask, as_tuple=False).squeeze(dim=1)
    
>       inds = ext_module.nms(
            bboxes, scores, iou_threshold=float(iou_threshold), offset=offset)
E       RuntimeError: nms is not compiled with GPU support

mmcv/ops/nms.py:26: RuntimeError
================================================================================= warnings summary =================================================================================
tests/test_ops/test_nms.py::Testnms::test_nms_allclose
  /opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/test_ops/test_nms.py::Testnms::test_nms_allclose
  /workspace/mmcv/mmcv/ops/fused_bias_leakyrelu.py:191: DeprecationWarning: invalid escape sequence \s
    """Fused bias leaky ReLU.

tests/test_ops/test_nms.py::Testnms::test_nms_allclose
  /workspace/mmcv/mmcv/ops/fused_bias_leakyrelu.py:226: DeprecationWarning: invalid escape sequence \s
    """Fused bias leaky ReLU function.

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================================= short test summary info ==============================================================================
FAILED tests/test_ops/test_nms.py::Testnms::test_nms_allclose - RuntimeError: nms is not compiled with GPU support
===================================================================== 1 failed, 3 passed, 3 warnings in 2.95s ======================================================================
root@bc530dbd64e2:/workspace/mmcv# MMCV_WITH_OPS=1 pip install -e .
Obtaining file:///workspace/mmcv
Requirement already satisfied: addict in /opt/conda/lib/python3.8/site-packages (from mmcv-full==1.3.9) (2.4.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from mmcv-full==1.3.9) (1.19.2)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.8/site-packages (from mmcv-full==1.3.9) (8.3.1)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from mmcv-full==1.3.9) (5.4.1)
Requirement already satisfied: yapf in /opt/conda/lib/python3.8/site-packages (from mmcv-full==1.3.9) (0.31.0)
Installing collected packages: mmcv-full
  Attempting uninstall: mmcv-full
    Found existing installation: mmcv-full 1.3.9
    Uninstalling mmcv-full-1.3.9:
      Successfully uninstalled mmcv-full-1.3.9
  Running setup.py develop for mmcv-full
Successfully installed mmcv-full
root@bc530dbd64e2:/workspace/mmcv# pytest tests/test_ops/test_nms.py
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.8.5, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /workspace/mmcv
plugins: cov-2.11.1, pythonpath-0.7.3, hypothesis-4.50.8
collected 4 items                                                                                                                                                                  

tests/test_ops/test_nms.py ....                                                                                                                                              [100%]

================================================================================= warnings summary =================================================================================
tests/test_ops/test_nms.py::Testnms::test_nms_allclose
  /opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/warnings.html

@zhouzaida
Copy link
Collaborator

zhouzaida commented Jul 13, 2021

please provide your command for building image

the command should be docker build --runtime=nvidia

@lingcong-k
Copy link
Author

lingcong-k commented Jul 13, 2021

--runtime=nvidia
@zhouzaida

i build with "DOCKER_BUILDKIT=1 docker build **********'

So its a must to have --runtime-nvidia ?

I try to add this flag but it says unknown flag --runtime

@zhouzaida
Copy link
Collaborator

--runtime=nvidia
@zhouzaida

i build with "DOCKER_BUILDKIT=1 docker build **********'

So its a must to have --runtime-nvidia ???

yet, maybe you could have a try. I think it will work

@lingcong-k
Copy link
Author

lingcong-k commented Jul 13, 2021

--runtime=nvidia
@zhouzaida

i build with "DOCKER_BUILDKIT=1 docker build **********'
So its a must to have --runtime-nvidia ???

yet, maybe you could have a try. I think it will work

@zhouzaida
how could u run
docker build --runtime=nvidia tho..
I can only use the runtime flag for docker run not docker build
docker run --runtime=nvidia will throw unknown flag runtime error

my default runtime setting in docker config is alreadu nvidia

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

thanks

@zhouzaida
Copy link
Collaborator

@lingcong-k
Copy link
Author

@zhouzaida Thanks.. I found the issue.

so if anybody else facing the same issue. check two things

  1. is the default runtime set to nvidia or not (under /etc/docker/daemon.json)
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  1. if u r building docker using DOCKER_BUILDKIT
    it has issue of preventing access to nvidia runtime
    No nvidia GPU access during build moby/buildkit#1800
    so dont use it

@lingcong-k lingcong-k changed the title RuntimeError: nms is not compiled with GPU support.. plz help mmcv-full not compiled when building inside docker Jul 15, 2021
@ganjbakhshali
Copy link

ganjbakhshali commented Oct 30, 2021

in docker these commands worked for me
`RUN git clone https://github.com/open-mmlab/mmcv.git

WORKDIR mmcv

RUN MMCV_WITH_OPS=1 pip install -e .`

@BrianPugh
Copy link

BrianPugh commented Nov 10, 2021

fwiw, I was able to resolve this (while still using buildkit) by adding the following to my dockerfile (before installing mmcv)

ARG TORCH_CUDA_ARCH_LIST="7.5;6.1"
ENV FORCE_CUDA="1"

you can specify whatever compute capabiliies you want based on the hardware you are going to be running:
https://developer.nvidia.com/cuda-gpus

@linzy5
Copy link

linzy5 commented Jun 3, 2024

I encountered the same problem. After some search and try, finally solve this issue by referring to to official dockerfile:https://github.com/open-mmlab/mmcv/blob/main/docker/dev/Dockerfile

You can add these lines in your dockerfile:

ENV TORCH_CUDA_ARCH_LIST=7.5+PTX
ENV FORCE_CUDA="1"
RUN cd /home/docker/ && \
    wget https://github.com/open-mmlab/mmcv/archive/refs/tags/v1.7.2.tar.gz && \
    tar -xzf v1.7.2.tar.gz && \
    rm -rf v1.7.2.tar.gz && \
    cd mmcv-1.7.2 && \
    MMCV_WITH_OPS=1 pip install --no-cache-dir -e .[all] -v

@jeanchristopheruel
Copy link

Hey, just a quick update it you want to compile for latest architectures using docker build, use this to your Dockerfile

ARG TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0 8.6+PTX"
ENV FORCE_CUDA="1"

See all the latest arch here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants