Exception: process 2 terminated with signal SIGFPE #3

lzrobots · 2019-10-10T20:39:47Z

what changes you made / what code you wrote: No
what command you run: python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
what you observed (full logs are preferred)

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

Command Line Args: Namespace(config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml', dist_url='tcp://127.0.0.1:54401', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
Process group URL: tcp://127.0.0.1:54401
Traceback (most recent call last):
File "tools/train_net.py", line 154, in
args=(args,),
File "/data/engs-tvg-lz/engs1870/projects/Det/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 2 terminated with signal SIGFPE

##Environment

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python -m detectron2.utils.collect_env

Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available False
Pillow 6.2.0
cv2 4.1.1

PyTorch built with:

GCC 7.3

Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Hi any thoughts on above error? thanks.

The text was updated successfully, but these errors were encountered:

ppwwyyxx · 2019-10-10T20:45:15Z

Could you post the requested details in the issue template? Thanks!

lzrobots · 2019-10-10T21:02:58Z

updated

ppwwyyxx · 2019-10-10T21:37:28Z

CUDA available False

Your pytorch cannot detect cuda. If you expect to use GPUs, you need to correctly install pytorch/cuda first.

Most models in detectron2 does not support CPU training.

lzrobots · 2019-10-10T22:33:50Z

right. fixed now.

I was following the Colab Notebook: pip install -U torch torchvision
so it's default cpu version.

You could then update it to coda version installation:
pip install torch==1.3.0+cu92 torchvision==0.4.1+cu92 -f https://download.pytorch.org/whl/torch_stable.html

pip install opencv-python tensorboard is required for step-to-step install instructions.

ppwwyyxx · 2019-10-10T23:00:28Z

pip install -U torch torchvision does install a cuda version. But it's not for cuda 9.2, and that's probably why it fails.

We'll add missing dependencies (probably not opencv-python, since it is an unofficial build of opencv). Thanks for catching this!

…9-patch-1 Update README.md

facebook-github-bot closed this as completed in b84b7be Oct 11, 2019

ppwwyyxx added the installation / environment label Oct 11, 2019

BrianPugh mentioned this issue Oct 14, 2019

RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda #62

Closed

batrlatom mentioned this issue Oct 15, 2019

Core dumped after running demo code #78

Closed

jiushishuai88 mentioned this issue Oct 21, 2019

Not compiled with GPU support #128

Closed

barakhi mentioned this issue Nov 4, 2019

Compiling on K80, executing on P100 #233

Closed

XuanyuanDi mentioned this issue Nov 7, 2019

RuntimeError: Not compiled with GPU support (ROIAlign_forward at /home/hd/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign.h:73) #267

Closed

jiangkansg mentioned this issue Dec 4, 2019

Do you support batch inference? #282

Closed

azuryl mentioned this issue Jan 14, 2020

RuntimeError: CUDA error: no kernel image is available for execution on the device #693

Closed

ghost mentioned this issue Jan 22, 2020

ROIAlign error #740

Closed

Samjith888 mentioned this issue Feb 10, 2020

The checkpoint contains parameters not used by the model #820

Closed

veronikayurchuk mentioned this issue Feb 17, 2020

RuntimeError: CUDA error: no kernel image is available for execution on the device #235

Closed

servercalap mentioned this issue Feb 17, 2020

custom dataset and custom train_net.py runtime error #893

Closed

ShawnNew pushed a commit to ShawnNew/detectron2 that referenced this issue Jul 1, 2020

Merge pull request facebookresearch#3 from zhanghang1989/zhanghang198…

845ffb3

…9-patch-1 Update README.md

abramjos mentioned this issue Jul 7, 2020

Caffe2 to c++ speed and gpu problems #1729

Closed

Julymycin mentioned this issue Aug 12, 2020

When use a exported mask rcnn caffe2 model to infer an image, get error [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0 #1895

Closed

manmani3 mentioned this issue Nov 25, 2020

How to know if the instance is in the specific area? #2311

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: process 2 terminated with signal SIGFPE #3

Exception: process 2 terminated with signal SIGFPE #3

lzrobots commented Oct 10, 2019 •

edited

Loading

ppwwyyxx commented Oct 10, 2019

lzrobots commented Oct 10, 2019

ppwwyyxx commented Oct 10, 2019

lzrobots commented Oct 10, 2019

ppwwyyxx commented Oct 10, 2019

Exception: process 2 terminated with signal SIGFPE #3

Exception: process 2 terminated with signal SIGFPE #3

Comments

lzrobots commented Oct 10, 2019 • edited Loading

ppwwyyxx commented Oct 10, 2019

lzrobots commented Oct 10, 2019

ppwwyyxx commented Oct 10, 2019

lzrobots commented Oct 10, 2019

ppwwyyxx commented Oct 10, 2019

lzrobots commented Oct 10, 2019 •

edited

Loading