Adapting a model to output filler words and stutters #7431

LahiLuk · 2023-09-13T14:19:45Z

LahiLuk
Sep 13, 2023

Hi,

I'm very new to ASR, so please bear with me :)

For a project, we are using the stt_en_conformer_transducer_xlarge model. In general, we are quite happy with the model performance, but we have noticed that it does not seem to output speech disfluencies, such as filler words and stutters [at least most of the time].

I would like to adapt / fine-tune the model to make it better at transcribing disfluencies. I can make use of around 50 hours of audio, paired with transcriptions of the stt_en_conformer_transducer_xlarge model, post-edited by humans to include speech disfluencies in the cases where they were present in the audio. Not all audio files actually contain disfluencies, but it's important for us to transcribe them when present.

I'm looking for general pointers / advice you might have for this specific use-case. I'm planning to start my experiments using the same model and adapters, but I'm not sure if this is the best approach, or if another model [size] might make more sense in this context.

levlzer0 · 2024-01-04T23:46:56Z

levlzer0
Jan 4, 2024

I'm looking to produce a verbatim transcription, word for word including filler words. Any pointers on the recipe for either training or fine-tuning any conformer model would be appreciated.

Is this not currently supported in conformer models and better to try other models such as whisper?

0 replies

LahiLuk · 2024-02-26T11:12:57Z

LahiLuk
Feb 26, 2024
Author

@levlzer0,

Following my post, I had some success adapting the stt_en_conformer_transducer_medium model to output filler words and stutters. I followed the method described in the Damage Control During Domain Adaptation paper, and used resources provided in the NeMo repository: scripts to use adapters with NeMo. After training, the number of transcribed filler words almost doubled, and the model was able to transcribe ~50% of all filler words.

If you want to try fine-tuning the model, there are also multiple resources available in the repo. This would allow you to change the tokenizer, enabling the model to learn additional filler words.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapting a model to output filler words and stutters #7431

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Adapting a model to output filler words and stutters #7431

LahiLuk Sep 13, 2023

Replies: 2 comments

levlzer0 Jan 4, 2024

LahiLuk Feb 26, 2024 Author

LahiLuk
Sep 13, 2023

levlzer0
Jan 4, 2024

LahiLuk
Feb 26, 2024
Author