Why does it take so long time to start? #27

pengzhiliang · 2019-10-12T07:18:30Z

❓ Questions and Help

Hello~
When I start to train RetinaNet with default setting, it is very slow in preparation phase ！
Info in the console is as following:

[10/12 14:51:43 detectron2]: Full config saved to output/detectron2/DEBUG/config.yaml
[10/12 14:51:43 d2.utils.env]: Using a generated random seed 43796016
[10/12 15:03:06 d2.engine.defaults]: Model:

from 14:51:43 to 15:03:06, it does not start to train.
Therefore, could you tell me why does it take so long time?
Thank you very much!

The text was updated successfully, but these errors were encountered:

ppwwyyxx · 2019-10-12T07:35:39Z

Please include details following the issue template

pengzhiliang · 2019-10-12T08:09:13Z

@ppwwyyxx OK.

I did not modify config file,and just ran command as following:

DIR=output/detectron2/coco/Retinanet
CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net.py --num-gpus 4 --dist-url auto \
                            --config-file configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml \
                            SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 \
                            MODEL.WEIGHTS models/R-50.pkl \
                            OUTPUT_DIR $DIR

Then, I didn't get error but found it took so long time to start.
Major information in the pycharm console is as following:

[10/12 14:51:43 detectron2]: Full config saved to output/detectron2/DEBUG/config.yaml
[10/12 14:51:43 d2.utils.env]: Using a generated random seed 43796016
[10/12 15:03:06 d2.engine.defaults]: Model:
RetinaNet(
  (backbone): FPN(
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    ........

Strangely, from 14:51:43 to 15:03:06, it did not start to train.

And my environment info:

---------------------  --------------------------------------------------
Python                 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler    GCC 5.4
DETECTRON2_ENV_MODULE  <not set>
PyTorch                1.3.0
PyTorch Debug Build    False
CUDA available         True
GPU 0,1,2,3            GeForce RTX 2080 Ti
Pillow                 6.2.0
cv2                    4.1.1
---------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

In summary, no error happened but it took so long time in preparation phase!

Thank you!

ppwwyyxx · 2019-10-12T08:25:09Z

Your version of pytorch is not built with the pre-computed code for your GPU architecture. In that case everything will run very slowly at first.

To resolve this you need to find a different build of pytorch or build by yourself.

pengzhiliang · 2019-10-12T08:28:02Z

OK, Thanks a lot!

ppwwyyxx · 2019-10-12T08:39:18Z

@soumith we've seen two reports about this issue. It seems like the pytorch 1.3 + cuda 10.1 package on pypi is built with GPU code up to 7.5 architectures, while the package on conda only has GPU code up to 5.0.

To users: use pip install rather than conda install should help

chenjoya · 2019-10-12T16:50:49Z

Sorry, I met the same problem here, it take so long time to start ... (pytorch 1.3 + cuda 10.1)

soumith · 2019-10-12T18:05:54Z

looking at this issue with hi-pri and tracking it in pytorch/pytorch#27807

soumith · 2019-10-12T21:44:10Z

This issue is now fixed with newly updated binaries.
Uninstalling and reinstalling PyTorch from Anaconda will fix it.

chenjoya · 2019-10-13T02:19:55Z

Thank you!

Summary: Resolves #27. Work in progress. Pull Request resolved: fairinternal/detectron2#51 Reviewed By: rbgirshick Differential Revision: D13544596 Pulled By: ppwwyyxx fbshipit-source-id: 0d7a8fa2ecadb47d88502714a191642ba6e17531

ppwwyyxx added the installation / environment label Oct 12, 2019

pengzhiliang closed this as completed Oct 12, 2019

ppwwyyxx mentioned this issue Oct 12, 2019

Model takes very long to load and tutorial script fails #7

Closed

ppwwyyxx mentioned this issue Oct 12, 2019

Is RTX support provided in this version? #29

Closed

ppwwyyxx mentioned this issue Oct 12, 2019

Spend too much time on running detectron2.engine.DefaultPredictor(cfg) #36

Closed

XuanyuanDi mentioned this issue Nov 7, 2019

RuntimeError: Not compiled with GPU support (ROIAlign_forward at /home/hd/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign.h:73) #267

Closed

servercalap mentioned this issue Feb 17, 2020

custom dataset and custom train_net.py runtime error #893

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does it take so long time to start? #27

Why does it take so long time to start? #27

pengzhiliang commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019

pengzhiliang commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019

pengzhiliang commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019 •

edited

Loading

chenjoya commented Oct 12, 2019

soumith commented Oct 12, 2019

soumith commented Oct 12, 2019

chenjoya commented Oct 13, 2019

Why does it take so long time to start? #27

Why does it take so long time to start? #27

Comments

pengzhiliang commented Oct 12, 2019

❓ Questions and Help

ppwwyyxx commented Oct 12, 2019

pengzhiliang commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019

pengzhiliang commented Oct 12, 2019

ppwwyyxx commented Oct 12, 2019 • edited Loading

chenjoya commented Oct 12, 2019

soumith commented Oct 12, 2019

soumith commented Oct 12, 2019

chenjoya commented Oct 13, 2019

ppwwyyxx commented Oct 12, 2019 •

edited

Loading