-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New recipe: tiny_transducer_ctc #848
Conversation
I get noticeable WER improvements just by enabling tf32 (see https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices). Will update the WER once all models' results are ready. |
Thanks. Could you also support streaming decoding with cached left context for the causal convolutional modules, like |
Looks like there's some work involved, will get to it when I have the time! |
Hi wangtiance, it looks like some of the files need to be black formatted to match the requirements, would you please look into that? Thank you! |
Reformatted. Now all checks have passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This recipe is intended for streaming ASR on very low cost devices, with model parameters in the range of 1-2M and less than 0.1GOPS . It uses a small convolutional net as the encoder. It is trained with combined transducer and CTC losses, and supports both phone and BPE lexicons. The encoder consists of 2 subsampling layers followed by a stack of Conv1d-batchnorm-activation-causal_squeeze_excite blocks, with optional skip add. It's a bit similar to CitriNet and ContextNet, but even smaller.
For WER and more details see README.md. Note that transducer decoding does NOT require external LM, therefore the WER looks higher than CTC.