-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add conformer configs for hat model #6372
Conversation
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fine, i wonder if we want to create a another directory for it such as asr/conf/conformer_hat
since it is technically a different model.
I am fine with this approach too but would like @VahidooX to review / comment on what his preference are for Conformer HAT.
It might be preferable to have a subdirectory conf/conformer/hat/conformer_hat_*.yaml
?
Looks good to me as long as they have their own separate folder under conformer. |
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
@@ -75,7 +75,7 @@ Key Features | |||
* Speech processing | |||
* `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) <https://huggingface.co/spaces/smajumdar/nemo_multilingual_language_id>`_ | |||
* `Automatic Speech Recognition (ASR) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/intro.html>`_ | |||
* Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC, FastConformer-CTC, FastConformer-Transducer... | |||
* Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC, FastConformer-CTC, FastConformer-Transducer, Conformer-HAT... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please also update the following statement to have Hybrid ASR:
Supports CTC, Transducer/RNNT and Hybrid losses/decoders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just left two minor comments.
docs/source/asr/models.rst
Outdated
.. _Conformer-HAT_model: | ||
|
||
Conformer-HAT (Hybrid Autoregressive Transducer) | ||
-------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines should have the same size as the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Conformer HAT model (do not confuse it with Hybrid-Transducer-CTC) is a modification of Conformer-Transducer model based on `Google paper <https://arxiv.org/abs/2003.07705>`_. | ||
The main idea is to separate labels and blank score predictions, which allows to estimate the internal LM probabilities during decoding. | ||
When external LM is available for inference, the internal LM can be subtracted from HAT model prediction in beamsearch decoding to improve external LM efficiency. | ||
It can be helpful in the case of text-only adaptation for new domains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can users use this feature?
Do the current LM scripts support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default Conformer HAT model works in decoding time as a standard Transducer model with the same interface. However, if you have an external ngram LM you can use scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py
script. The new updated version of the script is under reviewing -- #6370
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VahidooX -- could you approve the PR if everything is OK?
* add conformer configs for hat model Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
What does this PR do ?
Add conformer char and bpe configs for hat model (https://arxiv.org/abs/2003.07705)
Collection: [ASR]
Before your PR is "Ready for review"
Pre checks:
PR Type: