You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're looking to finetune a zipformer streaming model on our custom dataset of around 100 hours that we are about to get manually annotated. The speech in that dataset may contain disfluencies. So, in this case, is it better to create the annotations with disfluencies or should we opt to ignore them in the transcripts?
From the CSJ experiments in #892, we infer that the model trained and tested on fluent transcripts is performing slightly better. Is this inference correct? In the case of zipformer, are we to expect similar results or is training with disfluent transcriptions worth a shot? If yes what would be the ideal format for annotating disfluent speech for zipformer?
Any advice on this would be of great help.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi
We're looking to finetune a zipformer streaming model on our custom dataset of around 100 hours that we are about to get manually annotated. The speech in that dataset may contain disfluencies. So, in this case, is it better to create the annotations with disfluencies or should we opt to ignore them in the transcripts?
From the CSJ experiments in #892, we infer that the model trained and tested on fluent transcripts is performing slightly better. Is this inference correct? In the case of zipformer, are we to expect similar results or is training with disfluent transcriptions worth a shot? If yes what would be the ideal format for annotating disfluent speech for zipformer?
Any advice on this would be of great help.
Thanks!
The text was updated successfully, but these errors were encountered: