Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignment using ctc_segmentation fails with Hybrid RNNT-CTC models #8750

Closed
Jesteinbe opened this issue Mar 26, 2024 · 5 comments
Closed

alignment using ctc_segmentation fails with Hybrid RNNT-CTC models #8750

Jesteinbe opened this issue Mar 26, 2024 · 5 comments
Assignees
Labels
bug Something isn't working stale

Comments

@Jesteinbe
Copy link

Following the steps outlined in CTC_Segmentation_Tutorial.ipynb I'm trying to align text and audio. If I use a CTC-only model like stt_en_fastconformer_ctc_large then things work fine. However, if I try to use a EncDecHybridRNNTCTCBPEModel model like stt_en_fastconformer_hybrid_large_pc then things break. As far as I can tell, there are at least two problems.

The first issue is that hybrid models' vocabularies are mdl.cfg.aux_ctc.decoder.vocabulary, not mdl.cfg.decoder.vocabulary which is what is used by prepare_data.py and run_ctc_segmentation.py. Once I fixed this, prepare_data.py seems to work fine but the segmentation still fails.

I'm not sure what the second issue is yet but the alignment fails and I just get a generic error like:

INFO:root:Processing 1st_call.wav...
Transcribing: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.21s/it]
ERROR:root:list indices must be integers or slices, not tuple
ERROR:root:Skipping 1st_call.wav

I'm guessing the call to the forward pass of the ASR decoder is different for the hybrid RNNT-CTC models.

As for expected behavior, the ctc_segmentation should work regardless of which CTC-based model you use. In this case, the bug shouldn't have anything to do with my environment but I'm running on bare metal (Ubuntu 20.04) using a conda environment in which i installed Nemo v.1.22.0 via pip.

@Jesteinbe Jesteinbe added the bug Something isn't working label Mar 26, 2024
@nithinraok
Copy link
Collaborator

@erastorgueva-nv can you look at this issue pls.

@erastorgueva-nv
Copy link
Collaborator

Hi @Jesteinbe, thanks for bringing this to our attention. You are correct, unfortunately NeMo CTC Segmentation does not currently support Hybrid models.

Your fix of "mdl.cfg.aux_ctc.decoder.vocabulary" is a reasonable partial fix for this. The remaining issue of list indices must be integers or slices, not tuple can be fixed if you add the following lines in run_ctc_segmentation.py after asr_model is instantiated:

    if isinstance(asr_model, nemo_asr.models.EncDecHybridRNNTCTCModel):
        asr_model.change_decoding_strategy(decoder_type="ctc")

@Jesteinbe
Copy link
Author

Thanks @erastorgueva-nv ! I figured that out too.

I'm also seeing that I get the same results no matter what window size I use when calling run_ctc_segmentation.py. I've tried values ranging from 10 (which I thought should result in bad alignment if not break it), 8000, 16000, 32000, and 48000. i am using single channel audio with limited amounts of silence so i didn't expect a bigger window size was necessary but I was surprised to find identical results. I saw this behavior on both 2 recordings, one English and the other Spanish, that were about ~5min long. it seems like the window size isn't getting update properly but the log/segment file names are all showing the expected values. i haven't tested on a larger amount of audio yet so this could be an anomaly but thought it was worth mentioning.

Copy link
Contributor

github-actions bot commented May 6, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label May 6, 2024
@nithinraok
Copy link
Collaborator

@erastorgueva-nv could we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

3 participants