Open
Description
Bug Description
I get an error when converting a conformer transducer enecoder to tensorrt. (asr task)
To Reproduce
CODE:
import nemo.collections.asr as nemo_asr
import torch
import torch_tensorrt as torchtrt
nemo_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="stt_en_conformer_transducer_large")
nemo_model.freeze()
nemo_model.export(output="temp_rnnt.ts", check_trace=True)
with torchtrt.logging.debug():
variant = "encoder-temp_rnnt.ts"
precisions = [torch.float, torch.half]
batch_size = 1
model = torch.jit.load(variant)
inputs = [
torchtrt.Input(shape=[batch_size, 80, 8269]), # 8269 from mel spectr for 1min wav with resample
torchtrt.Input(shape=[1]),
]
for precision in precisions:
compile_settings = {
"inputs": inputs,
"enabled_precisions": {precision},
"workspace_size": 2000000000,
"truncate_long_and_double": True,
}
print(f"Generating Torchscript-TensorRT module for batchsize {batch_size} precision {precision}")
trt_ts_module = torchtrt.compile(model, **compile_settings)
torch.jit.save(trt_ts_module, f"{variant.replace('.ts','')}_bs{batch_size}_{precision}.torch-tensorrt")
CONSOLE:
Generating Torchscript-TensorRT module for batchsize 1 precision torch.float32
WARNING: [Torch-TensorRT] - Data types for input tensors have been modified by inserting aten::to operations which cast INT64 inputs to INT32. To disable this, please recompile using INT32 inputs
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT TorchScript Conversion Context] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - Trying to record the value lengths1.1 with the ITensor (Unnamed Layer* 13) [Unary]_output again.
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - Truncating aten::to output type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Trying to record the value padding_length.1 with the ITensor (Unnamed Layer* 26) [Identity]_output again.
WARNING: [Torch-TensorRT] - Truncating aten::to output type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Trying to record the value 28 with the ITensor (Unnamed Layer* 26) [Identity]_output again.
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - Unable to process input type of at::kLong, truncate type to at::kInt in scalar_to_tensor_util
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [Torch-TensorRT TorchScript Conversion Context] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [Torch-TensorRT
] - Truncating weight (constant in the graph) from Float64 to Float32
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer %103 : Tensor = aten::add(%matrix_ac.1, %matrix_bd0.1, %124) # /usr/local/lib/python3.8/dist-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py:243:0: broadcast dimensions must be conformable)
Segmentation fault (core dumped)
Expected behavior
I'm expecting a tensorrt file on the output
Environment
- Torch-TensorRT Version (e.g. 1.0.0): 1.4.0
- PyTorch Version (e.g. 1.0): 2.0.1+cu118
- CPU Architecture: AMD EPYC 7763 64-Core Processor
- OS (e.g., Linux): Ubuntu 20.04.5 LTS
- How you installed PyTorch (
conda
,pip
,libtorch
, source): pip - Python version: 3.8.10
- CUDA version: release 11.8, V11.8.89
- GPU models and configuration: NVIDIA A100 80GB
- image: nvcr.io/nvidia/tensorrt:22.12-py3
Additional context
I want to export from torch script to tenorrt encoder and decoder conformer transducer models