[recipe] AMI Zipformer transducer #698

desh2608 · 2022-11-20T16:38:23Z

This is a Zipformer based pruned transducer recipe for AMI dataset. We train a single model that can work well on all the conditions (IHM, SDM, enhanced MDM). WER results using modified beam search (--epoch 14 --avg 8 --use-averaged-model True):

Evaluation set	dev WER	test WER
IHM	18.92	17.40
SDM	31.25	32.21
MDM (GSS-enhanced)	21.67	22.43

The GSS enhancement is based on this package.

csukuangfj · 2022-11-21T01:11:52Z

So the results are better than that of

kaldi

and

espnet
- https://github.com/espnet/espnet/blob/master/egs/ami/asr1/RESULTS_ihm.md

https://github.com/espnet/espnet/tree/master/egs2/ami/asr1
has a lower WER in ihm dev/test dataset.

egs/ami/ASR/RESULTS.md

csukuangfj · 2022-11-21T01:19:01Z

egs/ami/ASR/pruned_transducer_stateless7/decode.py

+
+    add_model_arguments(parser)
+
+    return parser


Could you use the decode.py from librispeech?

icefall/egs/librispeech/ASR/pruned_transducer_stateless7/decode.py

Line 173 in 500792d

"--use-averaged-model",

the above option is crucial for getting better results in decoding.

I see. Let me try decoding with this option.

Using this gives me slightly worse WERs.

What --avg value are you using?

I used --iter 105000, --avg 9.

Using this gives me slightly worse WERs.

This is the first time that I have seen that it causes a worse WER, I think.

I will try more combinations and report here today. I tried that one because it was giving me the best WERs without the --use-averaged-model, but perhaps it may not be optimal when used with this option.

I tried using different number of epoch checkpoints for averaging, and here are the results using fast beam search. The first row is without --use-averaged-model. It seems averaging 7, 8, or 9 of the last epochs gives improvements.

Decoding method Dev IHM Test IHM Dev SDM Test SDM Dev GSS Test GSS

FBS (ckpt=105000, avg=10) 19.46 18.35 31.14 32.52 22.45 23.38

FBS (epoch=14, avg=5, use-avg-model) 19.75 18.63 31.27 32.26 22.7 23.58

FBS (epoch=14, avg=6, use-avg-model) 19.52 18.43 31.13 32.07 22.47 23.24

FBS (epoch=14, avg=7, use-avg-model) 19.42 18.17 31.01 32.04 22.3 22.98

FBS (epoch=14, avg=8, use-avg-model) 19.44 18.04 31.11 32.1 22.21 22.83

FBS (epoch=14, avg=9, use-avg-model) 19.49 17.99 31.34 32.35 22.22 22.78

FBS (epoch=14, avg=10, use-avg-model) 19.55 18.04 31.85 32.88 22.41 23.03

I will update the results to use --epoch 14 --avg 8 --use-averaged-model since it seems to provide consistent improvements across all sets.

egs/ami/ASR/pruned_transducer_stateless7/asr_datamodule.py

desh2608 · 2022-11-21T03:01:03Z

So the results are better than that of

kaldi

https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5/RESULTS_ihm

https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5/RESULTS_sdm

https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5/RESULTS_mdm

and

espnet

https://github.com/espnet/espnet/blob/master/egs/ami/asr1/RESULTS_ihm.md

https://github.com/espnet/espnet/tree/master/egs2/ami/asr1 has a lower WER in ihm dev/test dataset.

Yes, that's right. It's not directly comparable though since all of those train for specific conditions, e.g., for IHM, or for SDM. I was going for a more general-purpose model. The ESPNet2 model without LM is ~1% better (16.99%) on IHM. The best Kaldi numbers on the test set are 19.3% (IHM) and 35.1% (SDM).

N.B.: I had also tried training a model using only the IHM data, and it gave me 17.51% WER on the test set using modified beam search.

csukuangfj · 2022-11-22T00:55:48Z

Please mark this PR with the label ready when you think it is ready to merge.
Thanks!

desh2608 · 2022-11-23T19:07:16Z

Please mark this PR with the label ready when you think it is ready to merge. Thanks!

Perhaps we can get this merged if there are no further changes required?

csukuangfj · 2022-11-26T02:00:38Z

Please mark this PR with the label ready when you think it is ready to merge. Thanks!

Perhaps we can get this merged if there are no further changes required?

Thanks!

csukuangfj · 2022-11-26T02:02:38Z

@desh2608
Could you also upload tokens.txt to https://huggingface.co/desh2608/icefall-asr-ami-pruned-transducer-stateless7/tree/main/data/lang_bpe_500

Also, could you upload some test waves to the above huggingface repo, just like the following one?
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main/test_wavs

desh2608 · 2022-11-26T19:28:27Z

@desh2608 Could you also upload tokens.txt to https://huggingface.co/desh2608/icefall-asr-ami-pruned-transducer-stateless7/tree/main/data/lang_bpe_500

Also, could you upload some test waves to the above huggingface repo, just like the following one? https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main/test_wavs

Done!

desh2608 added 7 commits November 20, 2022 09:33

remove unnecessary changes

cad8689

add AMI prepare scripts

b673836

add zipformer scripts for AMI

356b3b9

added logs and pretrained model

31fb902

minor fix

b6f170b

remove unwanted changes

14d800f

fix missing link

5e1028a

csukuangfj reviewed Nov 21, 2022

View reviewed changes

egs/ami/ASR/RESULTS.md Show resolved Hide resolved

csukuangfj reviewed Nov 21, 2022

View reviewed changes

egs/ami/ASR/pruned_transducer_stateless7/asr_datamodule.py Outdated Show resolved Hide resolved

desh2608 added 2 commits November 20, 2022 22:32

make suggested changes

d1b5a16

update results

76bd0c2

desh2608 mentioned this pull request Nov 21, 2022

Call on more volunteers to add recipes with new datasets or new models #394

Open

desh2608 added the ready label Nov 22, 2022

csukuangfj merged commit db75627 into k2-fsa:master Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] AMI Zipformer transducer #698

[recipe] AMI Zipformer transducer #698

desh2608 commented Nov 20, 2022 •

edited

Loading

csukuangfj commented Nov 21, 2022

csukuangfj Nov 21, 2022

desh2608 Nov 21, 2022

desh2608 Nov 21, 2022

csukuangfj Nov 21, 2022

desh2608 Nov 21, 2022

csukuangfj Nov 21, 2022

desh2608 Nov 21, 2022

desh2608 Nov 21, 2022

desh2608 Nov 21, 2022

desh2608 Nov 21, 2022

desh2608 commented Nov 21, 2022 •

edited

Loading

csukuangfj commented Nov 22, 2022

desh2608 commented Nov 23, 2022

csukuangfj commented Nov 26, 2022

csukuangfj commented Nov 26, 2022

desh2608 commented Nov 26, 2022

Decoding method	Dev IHM	Test IHM	Dev SDM	Test SDM	Dev GSS	Test GSS
FBS (ckpt=105000, avg=10)	19.46	18.35	31.14	32.52	22.45	23.38
FBS (epoch=14, avg=5, use-avg-model)	19.75	18.63	31.27	32.26	22.7	23.58
FBS (epoch=14, avg=6, use-avg-model)	19.52	18.43	31.13	32.07	22.47	23.24
FBS (epoch=14, avg=7, use-avg-model)	19.42	18.17	31.01	32.04	22.3	22.98
FBS (epoch=14, avg=8, use-avg-model)	19.44	18.04	31.11	32.1	22.21	22.83
FBS (epoch=14, avg=9, use-avg-model)	19.49	17.99	31.34	32.35	22.22	22.78
FBS (epoch=14, avg=10, use-avg-model)	19.55	18.04	31.85	32.88	22.41	23.03

[recipe] AMI Zipformer transducer #698

[recipe] AMI Zipformer transducer #698

Conversation

desh2608 commented Nov 20, 2022 • edited Loading

csukuangfj commented Nov 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desh2608 commented Nov 21, 2022 • edited Loading

csukuangfj commented Nov 22, 2022

desh2608 commented Nov 23, 2022

csukuangfj commented Nov 26, 2022

csukuangfj commented Nov 26, 2022

desh2608 commented Nov 26, 2022

desh2608 commented Nov 20, 2022 •

edited

Loading

desh2608 commented Nov 21, 2022 •

edited

Loading