Conformer-Transducer beam search decoding with ONNX models #6811
Replies: 1 comment 7 replies
-
Woops, misclicked while typing. If you look at the exported model inference wrapper - https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py#L1026-L1196, And compare it with the pytorch greedy batch implementation - https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py#L479-L814, You'll see a vast majority of things overlap. With only some numpy or onnx call changes when calling the decoder joint. That's cause the onnx one is a direct copy paste of the pytorch one with just those minor parts modified plus some casts to make everything play nice. Perhaps a few things are hard coded such as the abstract methods of RNNTDecoder since the exported model no longer supports those utility functions Note that the concrete implementations of actually calling the exported model (say via onnx implementation) are implemented in So you would basically have to follow the same steps for beam search implementations here - https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py#L81-L1481 (you probably dont need all 1400 lines if you want to support just one of the beam algorithms) Do note that this not a trivial effort. Beam search implementation is actually quite a bit more complicated than greedy (plus it doesn't support batching). |
Beta Was this translation helpful? Give feedback.
-
There is a great
ONNXGreedyBatchedRNNTInfer
function for performing greedy decoding with.onnx
models.I was wondering how possible it would be to implement something similar for beam search? Specifically for the conformer-transducer which inherently supports beam search.
My initial thought would be to swap out all inference portions of the example script with ONNXRuntime sessions but this lead me down a rabbit hole.
Has this been attempted by anyone yet? I thought I'd check before using up too much of my time taking the wrong approaches.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions