Skip to content

[Detectron2] RuntimeError: No such operator torchvision::nms and RecursionError: maximum recursion depth exceeded #4180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KastanDay opened this issue Jul 14, 2021 · 3 comments

Comments

@KastanDay
Copy link

KastanDay commented Jul 14, 2021

🐛 Bug

Running Detectron2 demo.py creates RuntimeError: No such operator torchvision::nms error.
So far it's the same as #1405 but it gets worse. Creates a Max Recursion Depth error.

The primary issue is resolve with a simple naming change (below, thanks to @feiyuhuahuo). However, this creates the RecursionError: maximum recursion depth exceeded in comparison issue referenced by @vasyllyashkevych.

This fix to torchvision::nms creates RecursionError

# edit file: `local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py`

# OLD (bad): 
torch.ops.torchvision.nms(boxes, scores, iou_thres)

# NEW (better):
import torchvision # top of file
torchvision.ops.nms(boxes, scores, iou_thres)

This fix creates the RecursionError: maximum recursion depth exceeded

  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 43, in nms
    return torchvision.ops.nms(boxes, scores, iou_threshold)
  [Previous line repeated 970 more times]
RecursionError: maximum recursion depth exceeded

Full stack trace below 👇👇!

To Reproduce

Steps to reproduce the behavior:

  1. Build Detectron2 from source
sudo python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
  1. Clone Detectron2 repo
  2. Run Demo (from the docs https://detectron2.readthedocs.io/en/latest/tutorials/getting_started.html)
$ sudo python3 demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
  --input kasDemo.png \
  --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Environment

❗Note Pytorch was installed via PyTorch for Jetson

Simple system info:

Host machine: Nvidia Jetson Xaiver (arm architecture, not x64)
Python: python3.6
Detectron2: installed from source on Github (July 14, 2021)
torch version: 1.8.0
torchvision version: 0.7.0 (a0)
Cuda version: 10.2

Env collection script:

PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (aarch64)
GCC version: (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.25

Python version: 3.6.9 (default, Jan 26 2021, 15:33:00)  [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.9.201-tegra-aarch64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.0.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.0.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.8.0
[pip3] torchvision==0.7.0a0+78ed10c
[conda] Could not collect

Full stack trace:

$ sudo python3 demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
  --video-input IMG_3578.MOV \
  --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
  
[07/14 17:23:49 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=None, opts=['MODEL.WEIGHTS', 'detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl'], output=None, video_input='IMG_3578.MOV', webcam=False)
[07/14 17:24:00 fvcore.common.checkpoint]: [Checkpointer] Loading from detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl ...
[07/14 17:24:01 fvcore.common.checkpoint]: Reading a file from 'Detectron2 Model Zoo'
WARNING [07/14 17:24:01 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  proposal_generator.anchor_generator.cell_anchors.{0, 1, 2, 3, 4}
  0%|                                                                                   | 0/221 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "demo.py", line 176, in <module>
    for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
  File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1185, in __iter__
    for obj in iterable:
  File "/home/zion/detectron2/demo/predictor.py", line 129, in run_on_video
    yield process_predictions(frame, self.predictor(frame))
  File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/defaults.py", line 320, in __call__
    predictions = self.model([inputs])[0]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/detectron2/modeling/meta_arch/rcnn.py", line 146, in forward
    return self.inference(batched_inputs)
  File "/usr/local/lib/python3.6/dist-packages/detectron2/modeling/meta_arch/rcnn.py", line 204, in inference
    proposals, _ = self.proposal_generator(images, features, None)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/detectron2/modeling/proposal_generator/rpn.py", line 478, in forward
    anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
  File "/usr/local/lib/python3.6/dist-packages/detectron2/modeling/proposal_generator/rpn.py", line 511, in predict_proposals
    self.training,
  File "/usr/local/lib/python3.6/dist-packages/detectron2/modeling/proposal_generator/proposal_utils.py", line 116, in find_top_rpn_proposals
    keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
  File "/usr/local/lib/python3.6/dist-packages/detectron2/layers/nms.py", line 21, in batched_nms
    return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 1091, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 85, in batched_nms
    keep = nms(boxes_for_nms, scores, iou_threshold)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 43, in nms
    return torchvision.ops.nms(boxes, scores, iou_threshold)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 43, in nms
    return torchvision.ops.nms(boxes, scores, iou_threshold)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 43, in nms
    return torchvision.ops.nms(boxes, scores, iou_threshold)
  [Previous line repeated 969 more times]
RecursionError: maximum recursion depth exceeded

@KastanDay
Copy link
Author

This #1405 (comment) makes me believe my "fix" above is completely the wrong idea. So I'm back to having no solution at all.

@NicolasHug
Copy link
Member

NicolasHug commented Jul 15, 2021

Hi @KastanDay
Looks like you're relying on an old torchvision version, could you try updating to 0.10? As far as I know we don't officially support Jetson so you might need to build from source

@fmassa
Copy link
Member

fmassa commented Aug 12, 2021

Hi @KastanDay

Your patch is not correct, and that's why you are facing the recursion error.

You need to compile torchvision from source following the instructions in https://github.com/pytorch/vision#installation in order to have the C++ operators available (as @NicolasHug mentioned, we don't provide binaries for Jetson). Once you get torchvision compiled your issue should go away.

I believe I've answered your question and as such I'm closing this issue, but let us know if you have any further problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants