🐛 [Bug] Segfault when trying to export detectron2 model (GeneralizedRCNN) to Torch-TensorRT

##  Bug Description


I trained a detectron2 model (GeneralizedRCNN) as found on [detectron2 repo](https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/meta_arch/rcnn.py) and keep running into a Segfault upon trying to export the trained weights with Torch-TensorRT following the example instructions.
 I used the [export_model.py](https://github.com/facebookresearch/detectron2/blob/main/tools/deploy/export_model.py) with `scripting` mode to export a GeneralizedRCNN scripted model having a Resnet-50 backbone. 

I built a docker environment as available in the repo here (https://github.com/pytorch/TensorRT/pull/1852) to use Pytorch 2.1.0, Torch-TensorRT 1.5.0, TensorRT 8.6, CUDA 11.8, CuDNN 8.8. 
I have also tried exporting the same with a stable release version of Torch-TensorRT 1.3.0 and still keep getting the Segfault. 

**Can you provide any guidance or info related to these errors or if you have tested Torch-TensorRT with any of the detectron2 model zoo?**

## To Reproduce

Steps to reproduce the behavior:

1. Get a GeneralizedRCNN model with Resnet-50 backbone as found on detectron2 repo
2. Added the following snippet to call Torch-TRT compile module on a scripted model (i.e `torch.jit.script`) in [export_model.py](https://github.com/facebookresearch/detectron2/blob/main/tools/deploy/export_model.py) after [L102](https://github.com/facebookresearch/detectron2/blob/main/tools/deploy/export_model.py#L102)
```
        # Build Torchscript-TRT module for export
        trt_ts_model = torchtrt.compile(ts_model,
                                    inputs=[input_tensor],
                                    enabled_precisions={torch.half},
                                    min_block_size=3,
                                    workspace_size=1 << 32)
        with PathManager.open(os.path.join(output, "model_torch_trt.ts"), "wb") as f:
            trt_ts_model.save(trt_ts_model, f)
```
3. Run the export_model.py script to export the model under scripting mode and 


```
DEBUG: [Torch-TensorRT] - Setting node %23878 : Tensor = aten::_convolution(%21058, %self.model.backbone.bottom_up.stages.2.5.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23876, %23877, %144, %23876, %23876, %23876, %23876) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21068 : Tensor = aten::add(%out.129, %21038, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3710 : Tensor = aten::relu(%21068) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23881 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.conv1.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23879, %23880, %144, %23879, %23879, %23879, %23879) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21156 : Tensor = aten::relu(%out.2) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23884 : Tensor = aten::_convolution(%21156, %self.model.backbone.bottom_up.stages.3.0.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23882, %23883, %144, %23882, %23882, %23882, %23882) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21166 : Tensor = aten::relu(%out.10) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23887 : Tensor = aten::_convolution(%21166, %self.model.backbone.bottom_up.stages.3.0.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23885, %23886, %144, %23885, %23885, %23885, %23885) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23890 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.shortcut.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23888, %23889, %144, %23888, %23888, %23888, %23888) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21196 : Tensor = aten::relu(%out.24) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23896 : Tensor = aten::_convolution(%21196, %self.model.backbone.bottom_up.stages.3.1.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23894, %23895, %144, %23894, %23894, %23894, %23894) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21206 : Tensor = aten::relu(%out.28) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23899 : Tensor = aten::_convolution(%21206, %self.model.backbone.bottom_up.stages.3.1.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23897, %23898, %144, %23897, %23897, %23897, %23897) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21227 : Tensor = aten::relu(%out.1) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23905 : Tensor = aten::_convolution(%21227, %self.model.backbone.bottom_up.stages.3.2.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23903, %23904, %144, %23903, %23903, %23903, %23903) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21237 : Tensor = aten::relu(%out.9) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23908 : Tensor = aten::_convolution(%21237, %self.model.backbone.bottom_up.stages.3.2.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23906, %23907, %144, %23906, %23906, %23906, %23906) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21247 : Tensor = aten::add(%out.17, %21217, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3722 : Tensor = aten::relu(%21247) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %1228 : Tensor = aten::max_pool2d(%top_block_in_feature, %139, %141, %132, %139, %182) # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:788:11 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
Segmentation fault (core dumped)


```

## Expected behavior


As shown above, as the Torch-TRT compile module is called, the conversion will error out mid-way with a `Segmentation fault (core dumped)`.  Have tried it with different verisons and the error persists.
The expected behaviour would be the model gets exported and the `model_torch_trt.ts` is generated for use.

## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

All the packages and versions here come from the Torch-TensorRT docker container available in the instructions [here](https://github.com/pytorch/TensorRT/tree/main/docker)

 - Torch-TensorRT Version (e.g. 1.0.0): **1.5.0.dev0+ac3ab77a**
 - PyTorch Version (e.g. 1.0): **2.1.0+dev20230419**
 - CPU Architecture:
 - OS (e.g., Linux): **Ubuntu 22.04**
 - How you installed PyTorch (`conda`, `pip`, `libtorch`, source): **docker installed from the build instructions**
 - Build command you used (if compiling from source):
 - Are you using local sources or building from archives:
 - Python version: **3.10**
 - CUDA version: **11.8**
 - GPU models and configuration: **Nvidia RTX 3050 Ti**
 - Any other relevant information:

## Additional context


Additional Torch-TRT compile spec info

```
DEBUG: [Torch-TensorRT] - TensorRT Compile Spec: {
    "Inputs": [
Input(shape=(1,3,1344,1344,), dtype=Half, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))    ]
    "Enabled Precision": [Half, ]
    "TF32 Disabled": 0
    "Sparsity": 0
    "Refit": 0
    "Debug": 0
    "Device":  {
        "device_type": GPU
        "allow_gpu_fallback": False
        "gpu_id": 0
        "dla_core": -1
    }

    "Engine Capability": Default
    "Num Avg Timing Iters": 1
    "Workspace Size": 4294967296
    "DLA SRAM Size": 1048576
    "DLA Local DRAM Size": 1073741824
    "DLA Global DRAM Size": 536870912
    "Truncate long and double": 0
    "Torch Fallback":  {
        "enabled": True
        "min_block_size": 3
        "forced_fallback_operators": [
        ]
        "forced_fallback_modules": [
        ]
    }
}

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] Segfault when trying to export detectron2 model (GeneralizedRCNN) to Torch-TensorRT #1932

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [Bug] Segfault when trying to export detectron2 model (GeneralizedRCNN) to Torch-TensorRT #1932

Description

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions