Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[recipe] AMI Zipformer transducer #698

Merged
merged 9 commits into from
Nov 26, 2022
Merged

Conversation

desh2608
Copy link
Collaborator

@desh2608 desh2608 commented Nov 20, 2022

This is a Zipformer based pruned transducer recipe for AMI dataset. We train a single model that can work well on all the conditions (IHM, SDM, enhanced MDM). WER results using modified beam search (--epoch 14 --avg 8 --use-averaged-model True):

Evaluation set dev WER test WER
IHM 18.92 17.40
SDM 31.25 32.21
MDM (GSS-enhanced) 21.67 22.43

The GSS enhancement is based on this package.


add_model_arguments(parser)

return parser
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the decode.py from librispeech?


the above option is crucial for getting better results in decoding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let me try decoding with this option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this gives me slightly worse WERs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What --avg value are you using?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used --iter 105000, --avg 9.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this gives me slightly worse WERs.

This is the first time that I have seen that it causes a worse WER, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try more combinations and report here today. I tried that one because it was giving me the best WERs without the --use-averaged-model, but perhaps it may not be optimal when used with this option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using different number of epoch checkpoints for averaging, and here are the results using fast beam search. The first row is without --use-averaged-model. It seems averaging 7, 8, or 9 of the last epochs gives improvements.

Decoding method Dev IHM Test IHM Dev SDM Test SDM Dev GSS Test GSS
FBS (ckpt=105000, avg=10) 19.46 18.35 31.14 32.52 22.45 23.38
FBS (epoch=14, avg=5, use-avg-model) 19.75 18.63 31.27 32.26 22.7 23.58
FBS (epoch=14, avg=6, use-avg-model) 19.52 18.43 31.13 32.07 22.47 23.24
FBS (epoch=14, avg=7, use-avg-model) 19.42 18.17 31.01 32.04 22.3 22.98
FBS (epoch=14, avg=8, use-avg-model) 19.44 18.04 31.11 32.1 22.21 22.83
FBS (epoch=14, avg=9, use-avg-model) 19.49 17.99 31.34 32.35 22.22 22.78
FBS (epoch=14, avg=10, use-avg-model) 19.55 18.04 31.85 32.88 22.41 23.03

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the results to use --epoch 14 --avg 8 --use-averaged-model since it seems to provide consistent improvements across all sets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@desh2608
Copy link
Collaborator Author

desh2608 commented Nov 21, 2022

So the results are better than that of

and

https://github.com/espnet/espnet/tree/master/egs2/ami/asr1 has a lower WER in ihm dev/test dataset.

Yes, that's right. It's not directly comparable though since all of those train for specific conditions, e.g., for IHM, or for SDM. I was going for a more general-purpose model. The ESPNet2 model without LM is ~1% better (16.99%) on IHM. The best Kaldi numbers on the test set are 19.3% (IHM) and 35.1% (SDM).

N.B.: I had also tried training a model using only the IHM data, and it gave me 17.51% WER on the test set using modified beam search.

@csukuangfj
Copy link
Collaborator

Please mark this PR with the label ready when you think it is ready to merge.
Thanks!

@desh2608 desh2608 added the ready label Nov 22, 2022
@desh2608
Copy link
Collaborator Author

Please mark this PR with the label ready when you think it is ready to merge. Thanks!

Perhaps we can get this merged if there are no further changes required?

@csukuangfj
Copy link
Collaborator

Please mark this PR with the label ready when you think it is ready to merge. Thanks!

Perhaps we can get this merged if there are no further changes required?

Thanks!

@csukuangfj csukuangfj merged commit db75627 into k2-fsa:master Nov 26, 2022
@csukuangfj
Copy link
Collaborator

@desh2608
Copy link
Collaborator Author

@desh2608 Could you also upload tokens.txt to https://huggingface.co/desh2608/icefall-asr-ami-pruned-transducer-stateless7/tree/main/data/lang_bpe_500

Also, could you upload some test waves to the above huggingface repo, just like the following one? https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main/test_wavs

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants