-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alignment using ctc_segmentation fails with Hybrid RNNT-CTC models #8750
Comments
@erastorgueva-nv can you look at this issue pls. |
Hi @Jesteinbe, thanks for bringing this to our attention. You are correct, unfortunately NeMo CTC Segmentation does not currently support Hybrid models. Your fix of "
|
Thanks @erastorgueva-nv ! I figured that out too. I'm also seeing that I get the same results no matter what window size I use when calling |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
@erastorgueva-nv could we close this issue? |
Following the steps outlined in CTC_Segmentation_Tutorial.ipynb I'm trying to align text and audio. If I use a CTC-only model like stt_en_fastconformer_ctc_large then things work fine. However, if I try to use a EncDecHybridRNNTCTCBPEModel model like stt_en_fastconformer_hybrid_large_pc then things break. As far as I can tell, there are at least two problems.
The first issue is that hybrid models' vocabularies are
mdl.cfg.aux_ctc.decoder.vocabulary
, notmdl.cfg.decoder.vocabulary
which is what is used by prepare_data.py and run_ctc_segmentation.py. Once I fixed this, prepare_data.py seems to work fine but the segmentation still fails.I'm not sure what the second issue is yet but the alignment fails and I just get a generic error like:
I'm guessing the call to the forward pass of the ASR decoder is different for the hybrid RNNT-CTC models.
As for expected behavior, the ctc_segmentation should work regardless of which CTC-based model you use. In this case, the bug shouldn't have anything to do with my environment but I'm running on bare metal (Ubuntu 20.04) using a conda environment in which i installed Nemo v.1.22.0 via pip.
The text was updated successfully, but these errors were encountered: