-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does it take so long time to start? #27
Comments
Please include details following the issue template |
@ppwwyyxx OK. I did not modify config file,and just ran command as following: DIR=output/detectron2/coco/Retinanet
CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net.py --num-gpus 4 --dist-url auto \
--config-file configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml \
SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 \
MODEL.WEIGHTS models/R-50.pkl \
OUTPUT_DIR $DIR Then, I didn't get error but found it took so long time to start. [10/12 14:51:43 detectron2]: Full config saved to output/detectron2/DEBUG/config.yaml
[10/12 14:51:43 d2.utils.env]: Using a generated random seed 43796016
[10/12 15:03:06 d2.engine.defaults]: Model:
RetinaNet(
(backbone): FPN(
(fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
........ Strangely, from 14:51:43 to 15:03:06, it did not start to train. And my environment info: --------------------- --------------------------------------------------
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available True
GPU 0,1,2,3 GeForce RTX 2080 Ti
Pillow 6.2.0
cv2 4.1.1
--------------------- --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, In summary, no error happened but it took so long time in preparation phase! Thank you! |
Your version of pytorch is not built with the pre-computed code for your GPU architecture. In that case everything will run very slowly at first. To resolve this you need to find a different build of pytorch or build by yourself. |
OK, Thanks a lot! |
@soumith we've seen two reports about this issue. It seems like the pytorch 1.3 + cuda 10.1 package on pypi is built with GPU code up to 7.5 architectures, while the package on conda only has GPU code up to 5.0. To users: use |
Sorry, I met the same problem here, it take so long time to start ... (pytorch 1.3 + cuda 10.1) |
looking at this issue with hi-pri and tracking it in pytorch/pytorch#27807 |
This issue is now fixed with newly updated binaries. |
Thank you! |
Summary: Resolves #27. Work in progress. Pull Request resolved: fairinternal/detectron2#51 Reviewed By: rbgirshick Differential Revision: D13544596 Pulled By: ppwwyyxx fbshipit-source-id: 0d7a8fa2ecadb47d88502714a191642ba6e17531
❓ Questions and Help
Hello~
When I start to train RetinaNet with default setting, it is very slow in preparation phase !
Info in the console is as following:
from
14:51:43
to15:03:06
, it does not start to train.Therefore, could you tell me why does it take so long time?
Thank you very much!
The text was updated successfully, but these errors were encountered: