Model takes very long to load and tutorial script fails #7

DragonDuck · 2019-10-11T08:04:17Z

If you do not know the root cause of the problem / bug, and wish someone to help you, please
include:

When I try to run the code from the Detectron2 Tutorial Collab, the model takes an extremely long time to load and then crashes with a CUDA Error / Segmentation fault.

To Reproduce

what changes you made / what code you wrote: None
what command you run: Taken from Detectron2 Tutorial Collab

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2

im = cv2.imread("./input.jpg")

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)

what you observed (full logs are preferred):

The step DefaultPredictor(cfg) takes an extremely long time (>15 minutes). Specifically, the command build_model(cfg) within the class __init__() takes this long to complete.
Minimal usage of the GPU and maximal usage of the CPU (top lists the python process as taking up 100% of CPU power and approx. 3.5% of memory while the GPU takes only approximately 500MB out of available 16GB). It appears that PyTorch is attempting to execute everything on the CPU.
Once the model is finally built, attempting to run prediction results in an error:

WARNING [10/11 07:45:38 d2.config.compat]: Config 'configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    outputs = predictor(im)
  File "/home/jan/miniconda3/envs/detectron/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/jan/detectron2/detectron2/engine/defaults.py", line 171, in __call__
    height, width = original_image.shape[:2]
AttributeError: 'NoneType' object has no attribute 'shape'

However, I've checked all CUDA versions and everything points to CUDA 10.1, so I don't think this is a version mismatch:

$ conda list cuda
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name                    Version                   Build  Channel
cudatoolkit               10.1.168                      0

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

$ echo $LD_LIBRARY_PATH 
:/usr/local/cuda-10.1.back/lib64

$ nvidia-smi 
Fri Oct 11 07:45:41 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P0    41W / 250W |    269MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   51C    P0    41W / 250W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     16135      C   python                                       259MiB |
+-----------------------------------------------------------------------------+

All required versions as per the install page are fulfilled:

$ python --version
Python 3.6.9 :: Anaconda, Inc.

$ conda list
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.8.1                    pypi_0    pypi
backcall                  0.1.0                    py36_0  
blas                      1.0                         mkl  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2019.8.28                     0  
cairo                     1.14.12              h8948797_3  
certifi                   2019.9.11                py36_0  
cffi                      1.12.3           py36h2e261b9_0  
cloudpickle               1.2.2                    pypi_0    pypi
cudatoolkit               10.1.168                      0  
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.13                  pypi_0    pypi
decorator                 4.4.0                    py36_1  
detectron2                0.1                       dev_0    <develop>
ffmpeg                    4.0                  hcdf2ecd_0  
fontconfig                2.13.0               h9420a91_0  
freeglut                  3.0.0                hf484d3e_5  
freetype                  2.9.1                h8a8886c_1  
fvcore                    0.1                      pypi_0    pypi
glib                      2.56.2               hd408876_0  
graphite2                 1.3.13               h23475e2_0  
grpcio                    1.24.1                   pypi_0    pypi
harfbuzz                  1.8.8                hffaf4a1_0  
hdf5                      1.10.2               hba1933b_1  
icu                       58.2                 h9c2bf20_1  
intel-openmp              2019.4                      243  
ipython                   7.8.0            py36h39e3cac_0  
ipython_genutils          0.2.0                    py36_0  
jasper                    2.0.14               h07fcdf6_1  
jedi                      0.15.1                   py36_0  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.1.0                    pypi_0    pypi
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libglu                    9.0.0                hf484d3e_1  
libopencv                 3.4.2                hb342d67_1  
libopus                   1.3                  h7b6447c_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.0.10               h2733197_2  
libuuid                   1.0.3                h1bed415_2  
libvpx                    1.7.0                h439df22_0  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.9                hea5a465_1  
markdown                  3.1.1                    pypi_0    pypi
matplotlib                3.1.1                    pypi_0    pypi
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.14           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
numpy                     1.17.2           py36haad9e8e_0  
numpy-base                1.17.2           py36hde5b4d6_0  
olefile                   0.46                     py36_0  
opencv                    3.4.2            py36h6fd60c2_1  
openssl                   1.1.1d               h7b6447c_2  
parso                     0.5.1                      py_0  
pcre                      8.43                 he6710b0_0  
pexpect                   4.7.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pillow                    6.2.0            py36h34e0f95_0  
pip                       19.2.3                   py36_0  
pixman                    0.38.0               h7b6447c_0  
portalocker               1.5.1                    pypi_0    pypi
prompt_toolkit            2.0.10                     py_0  
protobuf                  3.10.0                   pypi_0    pypi
ptyprocess                0.6.0                    py36_0  
py-opencv                 3.4.2            py36hb342d67_1  
pycocotools               2.0                      pypi_0    pypi
pycparser                 2.19                     py36_0  
pygments                  2.4.2                      py_0  
pyparsing                 2.4.2                    pypi_0    pypi
python                    3.6.9                h265db76_0  
python-dateutil           2.8.0                    pypi_0    pypi
pytorch                   1.3.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
pyyaml                    5.1.2                    pypi_0    pypi
readline                  7.0                  h7b6447c_5  
setuptools                41.4.0                   py36_0  
shapely                   1.6.4.post2              pypi_0    pypi
six                       1.12.0                   py36_0  
sqlite                    3.30.0               h7b6447c_0  
tensorboard               2.0.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
torchvision               0.4.1                py36_cu101    pytorch
tqdm                      4.36.1                   pypi_0    pypi
traitlets                 4.3.3                    py36_0  
wcwidth                   0.1.7                    py36_0  
werkzeug                  0.16.0                   pypi_0    pypi
wheel                     0.33.6                   py36_0  
xz                        5.2.4                h14c3975_4  
yacs                      0.1.6                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609

Expected behavior

PyTorch appears to run primarily on the CPU instead of the GPU. I expect it to run primarily on the GPU. As I've made no edits to the code, I also expect it to run error-free.

Environment

$ python -m detectron2.utils.collect_env
---------------------  -------------------------------------------------------------------
Python                 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Detectron2 Compiler    GCC 5.4
DETECTRON2_ENV_MODULE  <not set>
PyTorch                1.3.0
PyTorch Debug Build    False
CUDA available         True
GPU 0,1                Tesla P100-PCIE-16GB
Pillow                 6.2.0
cv2                    3.4.2
---------------------  -------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

The text was updated successfully, but these errors were encountered:

ppwwyyxx · 2019-10-11T08:13:47Z

I think you can easily verify that it does not take more than 1 minute on colab, so this should be an environment issue.

What looks problematic in your environment is that your version of pytorch is not built with the compute compatibility for P100. In that case when some ops are run for the first time nvidia-driver will spend some time compiling them. Running it the second time on the same machine should be faster.

Running the code you showed also needs to download 178MB of model for the first time. This is fast on colab, but may be slow on your machine.

AttributeError: 'NoneType' object has no attribute 'shape'

It is saying that opencv cannot read the input image you gave. You may need to check your input path or the opencv installation.

Zehaos · 2019-10-11T17:16:40Z

The version of cudatoolkit is 10.1.168, while pytorch1.3 is build with cuda 10.1.243. Maybe this is the problem live in.

ppwwyyxx · 2019-10-12T08:30:23Z

Similar reports have been seen in #27 and pytorch/pytorch#537. You need to find a version of pytorch whose "NVCC architecture flags" include the compute compatibility of your GPU.
Closing as the issue is about pytorch installation.

ppwwyyxx · 2019-10-12T21:45:22Z

Has been fixed in pytorch according to #27

DragonDuck · 2019-10-14T02:30:57Z

Thank you very much for the help! I can confirm that the Conda version of PyTorch from last week wasn't properly compiled to support my GPU. This has been fixed and the newest PyTorch version downloaded via Conda works error-free.

ppwwyyxx added the installation / environment label Oct 11, 2019

ppwwyyxx closed this as completed Oct 12, 2019

ppwwyyxx mentioned this issue Oct 12, 2019

Spend too much time on running detectron2.engine.DefaultPredictor(cfg) #36

Closed

XuanyuanDi mentioned this issue Nov 7, 2019

RuntimeError: Not compiled with GPU support (ROIAlign_forward at /home/hd/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign.h:73) #267

Closed

azuryl mentioned this issue Jan 14, 2020

RuntimeError: CUDA error: no kernel image is available for execution on the device #693

Closed

servercalap mentioned this issue Feb 17, 2020

custom dataset and custom train_net.py runtime error #893

Closed

ShawnNew pushed a commit to ShawnNew/detectron2 that referenced this issue Jul 1, 2020

Update urls (facebookresearch#7)

12dfd59

abramjos mentioned this issue Jul 7, 2020

Caffe2 to c++ speed and gpu problems #1729

Closed

Julymycin mentioned this issue Aug 12, 2020

When use a exported mask rcnn caffe2 model to infer an image, get error [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0 #1895

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model takes very long to load and tutorial script fails #7

Model takes very long to load and tutorial script fails #7

DragonDuck commented Oct 11, 2019

ppwwyyxx commented Oct 11, 2019 •

edited

Loading

Zehaos commented Oct 11, 2019

ppwwyyxx commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019

DragonDuck commented Oct 14, 2019

Model takes very long to load and tutorial script fails #7

Model takes very long to load and tutorial script fails #7

Comments

DragonDuck commented Oct 11, 2019

To Reproduce

Expected behavior

Environment

ppwwyyxx commented Oct 11, 2019 • edited Loading

Zehaos commented Oct 11, 2019

ppwwyyxx commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019

DragonDuck commented Oct 14, 2019

ppwwyyxx commented Oct 11, 2019 •

edited

Loading