You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained a detectron2 model (GeneralizedRCNN) as found on detectron2 repo and keep running into a Segfault upon trying to export the trained weights with Torch-TensorRT following the example instructions.
I used the export_model.py with scripting mode to export a GeneralizedRCNN scripted model having a Resnet-50 backbone.
I built a docker environment as available in the repo here (#1852) to use Pytorch 2.1.0, Torch-TensorRT 1.5.0, TensorRT 8.6, CUDA 11.8, CuDNN 8.8.
I have also tried exporting the same with a stable release version of Torch-TensorRT 1.3.0 and still keep getting the Segfault.
Can you provide any guidance or info related to these errors or if you have tested Torch-TensorRT with any of the detectron2 model zoo?
To Reproduce
Steps to reproduce the behavior:
Get a GeneralizedRCNN model with Resnet-50 backbone as found on detectron2 repo
Added the following snippet to call Torch-TRT compile module on a scripted model (i.e torch.jit.script) in export_model.py after L102
# Build Torchscript-TRT module for export
trt_ts_model = torchtrt.compile(ts_model,
inputs=[input_tensor],
enabled_precisions={torch.half},
min_block_size=3,
workspace_size=1 << 32)
with PathManager.open(os.path.join(output, "model_torch_trt.ts"), "wb") as f:
trt_ts_model.save(trt_ts_model, f)
Run the export_model.py script to export the model under scripting mode and
DEBUG: [Torch-TensorRT] - Setting node %23878 : Tensor = aten::_convolution(%21058, %self.model.backbone.bottom_up.stages.2.5.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23876, %23877, %144, %23876, %23876, %23876, %23876) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21068 : Tensor = aten::add(%out.129, %21038, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3710 : Tensor = aten::relu(%21068) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23881 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.conv1.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23879, %23880, %144, %23879, %23879, %23879, %23879) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21156 : Tensor = aten::relu(%out.2) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23884 : Tensor = aten::_convolution(%21156, %self.model.backbone.bottom_up.stages.3.0.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23882, %23883, %144, %23882, %23882, %23882, %23882) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21166 : Tensor = aten::relu(%out.10) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23887 : Tensor = aten::_convolution(%21166, %self.model.backbone.bottom_up.stages.3.0.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23885, %23886, %144, %23885, %23885, %23885, %23885) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23890 : Tensor = aten::_convolution(%3710, %self.model.backbone.bottom_up.stages.3.0.shortcut.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %141, %132, %139, %23888, %23889, %144, %23888, %23888, %23888, %23888) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21196 : Tensor = aten::relu(%out.24) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23896 : Tensor = aten::_convolution(%21196, %self.model.backbone.bottom_up.stages.3.1.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23894, %23895, %144, %23894, %23894, %23894, %23894) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21206 : Tensor = aten::relu(%out.28) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23899 : Tensor = aten::_convolution(%21206, %self.model.backbone.bottom_up.stages.3.1.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23897, %23898, %144, %23897, %23897, %23897, %23897) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21227 : Tensor = aten::relu(%out.1) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:196:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23905 : Tensor = aten::_convolution(%21227, %self.model.backbone.bottom_up.stages.3.2.conv2.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %139, %139, %23903, %23904, %144, %23903, %23903, %23903, %23903) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21237 : Tensor = aten::relu(%out.9) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:199:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %23908 : Tensor = aten::_convolution(%21237, %self.model.backbone.bottom_up.stages.3.2.conv3.weight, %self.model.backbone.bottom_up.stem.conv1.bias.443, %139, %132, %139, %23906, %23907, %144, %23906, %23906, %23906, %23906) to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %21247 : Tensor = aten::add(%out.17, %21217, %144) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:208:8 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %3722 : Tensor = aten::relu(%21247) # /usr/local/lib/python3.10/dist-packages/detectron2/modeling/backbone/resnet.py:209:14 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
DEBUG: [Torch-TensorRT] - Setting node %1228 : Tensor = aten::max_pool2d(%top_block_in_feature, %139, %141, %132, %139, %182) # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:788:11 to run torch due owning block not large enough to exceed user specified min_block_size (previously was to run in tensorrt)
Segmentation fault (core dumped)
Expected behavior
As shown above, as the Torch-TRT compile module is called, the conversion will error out mid-way with a Segmentation fault (core dumped). Have tried it with different verisons and the error persists.
The expected behaviour would be the model gets exported and the model_torch_trt.ts is generated for use.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
All the packages and versions here come from the Torch-TensorRT docker container available in the instructions here
Torch-TensorRT Version (e.g. 1.0.0): 1.5.0.dev0+ac3ab77a
PyTorch Version (e.g. 1.0): 2.1.0+dev20230419
CPU Architecture:
OS (e.g., Linux): Ubuntu 22.04
How you installed PyTorch (conda, pip, libtorch, source): docker installed from the build instructions
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Bug Description
I trained a detectron2 model (GeneralizedRCNN) as found on detectron2 repo and keep running into a Segfault upon trying to export the trained weights with Torch-TensorRT following the example instructions.
I used the export_model.py with
scripting
mode to export a GeneralizedRCNN scripted model having a Resnet-50 backbone.I built a docker environment as available in the repo here (#1852) to use Pytorch 2.1.0, Torch-TensorRT 1.5.0, TensorRT 8.6, CUDA 11.8, CuDNN 8.8.
I have also tried exporting the same with a stable release version of Torch-TensorRT 1.3.0 and still keep getting the Segfault.
Can you provide any guidance or info related to these errors or if you have tested Torch-TensorRT with any of the detectron2 model zoo?
To Reproduce
Steps to reproduce the behavior:
torch.jit.script
) in export_model.py after L102Expected behavior
As shown above, as the Torch-TRT compile module is called, the conversion will error out mid-way with a
Segmentation fault (core dumped)
. Have tried it with different verisons and the error persists.The expected behaviour would be the model gets exported and the
model_torch_trt.ts
is generated for use.Environment
All the packages and versions here come from the Torch-TensorRT docker container available in the instructions here
conda
,pip
,libtorch
, source): docker installed from the build instructionsAdditional context
Additional Torch-TRT compile spec info
The text was updated successfully, but these errors were encountered: