Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert bool((rel_pair_idx == pair_idx[vr_indices]).all()) #2

Open
tsamoura opened this issue Jan 17, 2023 · 0 comments
Open

assert bool((rel_pair_idx == pair_idx[vr_indices]).all()) #2

tsamoura opened this issue Jan 17, 2023 · 0 comments

Comments

@tsamoura
Copy link

tsamoura commented Jan 17, 2023

Dear authors,

Congratulations for the very nice work! I ran your code for SGDET and I got an assertion error. In particular, I ran this command:

CUDA_VISIBLE_DEVICES=6 \
python tools/relation_train_net.py \
 --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
 MODEL.ROI_RELATION_HEAD.USE_GT_BOX False \
 MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False \
 MODEL.ROI_RELATION_HEAD.PREDICTOR RUNetPredictor \
 SOLVER.IMS_PER_BATCH 1 \
 TEST.IMS_PER_BATCH 1 \
 DTYPE "float16" \
 SOLVER.PRE_VAL True \
 SOLVER.BASE_LR 0.0025 \
 MODEL.ROI_RELATION_HEAD.L21_LOSS 0.7 \
 MODEL.PRETRAINED_DETECTOR_CKPT ~/checkpoints/pretrained_faster_rcnn/model_final.pth \
 OUTPUT_DIR ~/checkpoints/runet-sgdet

and I got the exception:

maskrcnn_benchmark INFO: -------------------------------
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Traceback (most recent call last):
  File "tools/relation_train_net.py", line 379, in <module>
    main()
  File "tools/relation_train_net.py", line 372, in main
    model = train(cfg, args.local_rank, args.distributed, logger)
  File "tools/relation_train_net.py", line 147, in train
    loss_dict = model(images, targets)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 69, in forward
    x, detections, loss_relation = self.relation(features, detections, targets, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 94, in forward
    refine_logits, relation_logits, add_losses = self.predictor(proposals, rel_pair_idxs, full_pair_idxs, rel_labels, rel_binarys, roi_features, union_features, logger)
  File "/anaconda3/envs/ru_net/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/ru_net/RU-Net/maskrcnn_benchmark/modeling/roi_heads/relation_head/roi_relation_predictors.py", line 819, in forward
    assert bool((rel_pair_idx == pair_idx[vr_indices]).all())

Notice that I got the same assertion error when trying with multiple GPUs, i.e., when running this command:

CUDA_VISIBLE_DEVICES=6,7 \
python -m torch.distributed.launch \
 --master_port 15026 \
 --nproc_per_node=2 \
 tools/relation_train_net.py \
 --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
 MODEL.ROI_RELATION_HEAD.USE_GT_BOX False \
 MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False \
 MODEL.ROI_RELATION_HEAD.PREDICTOR RUNetPredictor \
 SOLVER.IMS_PER_BATCH 2 \
 TEST.IMS_PER_BATCH 2 \
 DTYPE "float16" \
 SOLVER.PRE_VAL True \
 SOLVER.BASE_LR 0.0025 \
 MODEL.ROI_RELATION_HEAD.L21_LOSS 0.7 \
 MODEL.PRETRAINED_DETECTOR_CKPT ~/checkpoints/pretrained_faster_rcnn/model_final.pth \
 OUTPUT_DIR ~/checkpoints/runet-sgdet-2gpus

Any suggestions for fixing the issue?

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant