-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[recipe] AMI Zipformer transducer #698
Conversation
So the results are better than that of
and https://github.com/espnet/espnet/tree/master/egs2/ami/asr1 |
|
||
add_model_arguments(parser) | ||
|
||
return parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the decode.py from librispeech?
"--use-averaged-model", |
the above option is crucial for getting better results in decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let me try decoding with this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this gives me slightly worse WERs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What --avg
value are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used --iter 105000, --avg 9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this gives me slightly worse WERs.
This is the first time that I have seen that it causes a worse WER, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try more combinations and report here today. I tried that one because it was giving me the best WERs without the --use-averaged-model, but perhaps it may not be optimal when used with this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using different number of epoch checkpoints for averaging, and here are the results using fast beam search. The first row is without --use-averaged-model
. It seems averaging 7, 8, or 9 of the last epochs gives improvements.
Decoding method | Dev IHM | Test IHM | Dev SDM | Test SDM | Dev GSS | Test GSS |
---|---|---|---|---|---|---|
FBS (ckpt=105000, avg=10) | 19.46 | 18.35 | 31.14 | 32.52 | 22.45 | 23.38 |
FBS (epoch=14, avg=5, use-avg-model) | 19.75 | 18.63 | 31.27 | 32.26 | 22.7 | 23.58 |
FBS (epoch=14, avg=6, use-avg-model) | 19.52 | 18.43 | 31.13 | 32.07 | 22.47 | 23.24 |
FBS (epoch=14, avg=7, use-avg-model) | 19.42 | 18.17 | 31.01 | 32.04 | 22.3 | 22.98 |
FBS (epoch=14, avg=8, use-avg-model) | 19.44 | 18.04 | 31.11 | 32.1 | 22.21 | 22.83 |
FBS (epoch=14, avg=9, use-avg-model) | 19.49 | 17.99 | 31.34 | 32.35 | 22.22 | 22.78 |
FBS (epoch=14, avg=10, use-avg-model) | 19.55 | 18.04 | 31.85 | 32.88 | 22.41 | 23.03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the results to use --epoch 14 --avg 8 --use-averaged-model
since it seems to provide consistent improvements across all sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Yes, that's right. It's not directly comparable though since all of those train for specific conditions, e.g., for IHM, or for SDM. I was going for a more general-purpose model. The ESPNet2 model without LM is ~1% better (16.99%) on IHM. The best Kaldi numbers on the test set are 19.3% (IHM) and 35.1% (SDM). N.B.: I had also tried training a model using only the IHM data, and it gave me 17.51% WER on the test set using modified beam search. |
Please mark this PR with the label |
Perhaps we can get this merged if there are no further changes required? |
Thanks! |
@desh2608 Also, could you upload some test waves to the above huggingface repo, just like the following one? |
Done! |
This is a Zipformer based pruned transducer recipe for AMI dataset. We train a single model that can work well on all the conditions (IHM, SDM, enhanced MDM). WER results using modified beam search (
--epoch 14 --avg 8 --use-averaged-model True
):The GSS enhancement is based on this package.