Description
Bug Description
I trained a detectron2 model (GeneralizedRCNN) as found on detectron2 repo and keep running into a Segfault upon trying to export the trained weights with Torch-TensorRT following the example instructions.
I used the export_model.py with scripting
mode to export a GeneralizedRCNN scripted model having a Resnet-50 backbone.
I built a docker environment as available in the repo here (#1852) to use Pytorch 2.1.0, Torch-TensorRT 1.5.0, TensorRT 8.6, CUDA 11.8, CuDNN 8.8.
I have also tried exporting the same with a stable release version of Torch-TensorRT 1.3.0 and still keep getting the Segfault.
Can you provide any guidance or info related to these errors or if you have tested Torch-TensorRT with any of the detectron2 model zoo?
To Reproduce
Steps to reproduce the behavior:
- Get a GeneralizedRCNN model with Resnet-50 backbone as found on detectron2 repo
- Added the following snippet to call Torch-TRT compile module on a scripted model (i.e
torch.jit.script
) in export_model.py after L102
# Build Torchscript-TRT module for export
trt_ts_model = torchtrt.compile(ts_model,
inputs=[input_tensor],
enabled_precisions={torch.half},
min_block_size=3,
workspace_size=1 << 32)
with PathManager.open(os.path.join(output, "model_torch_trt.ts"), "wb") as f:
trt_ts_model.save(trt_ts_model, f)
- Run the export_model.py script to export the model under scripting mode and
DEBUG: [Torch-TensorRT] - Setting node %23878 : Tensor = aten::_convolution(%21058, %self.model.backbone.bottom_up.stages.2.5.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23876, %23877, %144, %23876, %23876, %23876, %23876) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21068 : Tensor = aten::add(%out.129, %21038, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3710 : Tensor = aten::relu(%21068) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23881 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.conv1.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23879, %23880, %144, %23879, %23879, %23879, %23879) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21156 : Tensor = aten::relu(%out.2) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23884 : Tensor = aten::_convolution(%21156, %self.model.backbone.bottom_up.stages.3.0.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23882, %23883, %144, %23882, %23882, %23882, %23882) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21166 : Tensor = aten::relu(%out.10) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23887 : Tensor = aten::_convolution(%21166, %self.model.backbone.bottom_up.stages.3.0.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23885, %23886, %144, %23885, %23885, %23885, %23885) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23890 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.shortcut.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23888, %23889, %144, %23888, %23888, %23888, %23888) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21196 : Tensor = aten::relu(%out.24) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23896 : Tensor = aten::_convolution(%21196, %self.model.backbone.bottom_up.stages.3.1.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23894, %23895, %144, %23894, %23894, %23894, %23894) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21206 : Tensor = aten::relu(%out.28) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23899 : Tensor = aten::_convolution(%21206, %self.model.backbone.bottom_up.stages.3.1.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23897, %23898, %144, %23897, %23897, %23897, %23897) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21227 : Tensor = aten::relu(%out.1) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23905 : Tensor = aten::_convolution(%21227, %self.model.backbone.bottom_up.stages.3.2.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23903, %23904, %144, %23903, %23903, %23903, %23903) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21237 : Tensor = aten::relu(%out.9) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23908 : Tensor = aten::_convolution(%21237, %self.model.backbone.bottom_up.stages.3.2.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23906, %23907, %144, %23906, %23906, %23906, %23906) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21247 : Tensor = aten::add(%out.17, %21217, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3722 : Tensor = aten::relu(%21247) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %1228 : Tensor = aten::max_pool2d(%top_block_in_feature, %139, %141, %132, %139, %182) # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:788:11 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
Segmentation fault (core dumped)
Expected behavior
As shown above, as the Torch-TRT compile module is called, the conversion will error out mid-way with a Segmentation fault (core dumped)
. Have tried it with different verisons and the error persists.
The expected behaviour would be the model gets exported and the model_torch_trt.ts
is generated for use.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
All the packages and versions here come from the Torch-TensorRT docker container available in the instructions here
- Torch-TensorRT Version (e.g. 1.0.0): 1.5.0.dev0+ac3ab77a
- PyTorch Version (e.g. 1.0): 2.1.0+dev20230419
- CPU Architecture:
- OS (e.g., Linux): Ubuntu 22.04
- How you installed PyTorch (
conda
,pip
,libtorch
, source): docker installed from the build instructions - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version: 3.10
- CUDA version: 11.8
- GPU models and configuration: Nvidia RTX 3050 Ti
- Any other relevant information:
Additional context
Additional Torch-TRT compile spec info
DEBUG: [Torch-TensorRT] - TensorRT Compile Spec: {
"Inputs": [
Input(shape=(1,3,1344,1344,), dtype=Half, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2)) ]
"Enabled Precision": [Half, ]
"TF32 Disabled": 0
"Sparsity": 0
"Refit": 0
"Debug": 0
"Device": {
"device_type": GPU
"allow_gpu_fallback": False
"gpu_id": 0
"dla_core": -1
}
"Engine Capability": Default
"Num Avg Timing Iters": 1
"Workspace Size": 4294967296
"DLA SRAM Size": 1048576
"DLA Local DRAM Size": 1073741824
"DLA Global DRAM Size": 536870912
"Truncate long and double": 0
"Torch Fallback": {
"enabled": True
"min_block_size": 3
"forced_fallback_operators": [
]
"forced_fallback_modules": [
]
}
}