Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: process 2 terminated with signal SIGFPE #3

Closed
lzrobots opened this issue Oct 10, 2019 · 5 comments
Closed

Exception: process 2 terminated with signal SIGFPE #3

lzrobots opened this issue Oct 10, 2019 · 5 comments

Comments

@lzrobots
Copy link

lzrobots commented Oct 10, 2019

  • what changes you made / what code you wrote: No

  • what command you run: python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

  • what you observed (full logs are preferred)

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

Command Line Args: Namespace(config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml', dist_url='tcp://127.0.0.1:54401', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
Process group URL: tcp://127.0.0.1:54401
Traceback (most recent call last):
File "tools/train_net.py", line 154, in
args=(args,),
File "/data/engs-tvg-lz/engs1870/projects/Det/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 2 terminated with signal SIGFPE

##Environment

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python -m detectron2.utils.collect_env


Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available False
Pillow 6.2.0
cv2 4.1.1


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Hi any thoughts on above error? thanks.

@ppwwyyxx
Copy link
Contributor

Could you post the requested details in the issue template? Thanks!

@lzrobots
Copy link
Author

updated

@ppwwyyxx
Copy link
Contributor

CUDA available False

Your pytorch cannot detect cuda. If you expect to use GPUs, you need to correctly install pytorch/cuda first.

Most models in detectron2 does not support CPU training.

@lzrobots
Copy link
Author

right. fixed now.

I was following the Colab Notebook: pip install -U torch torchvision
so it's default cpu version.

You could then update it to coda version installation:
pip install torch==1.3.0+cu92 torchvision==0.4.1+cu92 -f https://download.pytorch.org/whl/torch_stable.html

pip install opencv-python tensorboard is required for step-to-step install instructions.

@ppwwyyxx
Copy link
Contributor

pip install -U torch torchvision does install a cuda version. But it's not for cuda 9.2, and that's probably why it fails.

We'll add missing dependencies (probably not opencv-python, since it is an unofficial build of opencv). Thanks for catching this!

@ghost ghost mentioned this issue Jan 22, 2020
ShawnNew pushed a commit to ShawnNew/detectron2 that referenced this issue Jul 1, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants