what can we say about the model or the dataset , if i have the results of CTC and RNNT giving almost the same WER ? #9617

12DEVESH · 2024-07-04T15:03:42Z

12DEVESH
Jul 4, 2024

so i tried two models
Conformer-CTC and Conformer-RNNT on the same dataset. though there are many documents on NVIDIA docs which says that RNNT model is better than CTC. but through all my experiments I am getting almost same results for them .

Also training RNNT is costing me latency.
RNNT takes more time in inference than CTC though RNNT is made for streaming tasks.

attaching some observations :
Dataset-1:
val_wer graphs : (purple is RNNT )

inference time in s : (Purple is RNNT)

Also one thing I noticed is the inference time gap varies with different dataset:
Dataset-2:
Val_wer( purple is RNNT)

inference time:

both the datasets contains multilingual audios and corresponding transcripts in Romanic form.

titu1994 · 2024-07-09T08:05:40Z

titu1994
Jul 9, 2024
Maintainer

This is interesting, normally we find RNNT to be slower but more accurate. What languages are being trained on? Maybe if they are Chinese, Japanese Korean, the vocab size causes slower learning ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what can we say about the model or the dataset , if i have the results of CTC and RNNT giving almost the same WER ? #9617

{{title}}

Replies: 1 comment

{{title}}

Select a reply

what can we say about the model or the dataset , if i have the results of CTC and RNNT giving almost the same WER ? #9617

12DEVESH Jul 4, 2024

Replies: 1 comment

titu1994 Jul 9, 2024 Maintainer

12DEVESH
Jul 4, 2024

titu1994
Jul 9, 2024
Maintainer