Whisper large fine-tuning on wenetspeech, mutli-hans-zh #1483

yuekaizhang · 2024-01-31T09:39:28Z

This PR adds:

SpeechIO test set recipe: decoding results on whisper, zipformer, see https://github.com/yuekaizhang/icefall/blob/whisper_zh/egs/speechio/ASR/RESULTS.md

Model	SpeechIO 001-026 Avg WER	Comment
zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24	7.48%	Zipformer trained on 14k hours ZH data
whisper-large-v2-wenetspeech	8.01%	large-v2 fine-tuned with wenetspeech 10k hours, suffer deletion errors: see wenet-e2e/WenetSpeech#54
whisper-large-v2-wenetspeech + zipformer	6.93%	Utilize zipformer to reduce deletion errors: see here

Whisper fine-tuning recipe on wenetspeech, multi-hans-zh

Following PR TODOs:

wenetspeech/whisper RESULTS.md update
multi-hans-zh/whisper RESULTS.md update

yuekaizhang · 2024-02-20T09:14:16Z

Not sure about deletion errors, but the key aspect of whisper is that it is multiobjective both transcription and translation. Translation task is not just for fun, it helps model to factorize out grammar and sense aspect and improve ASR accuracy.

If you want to get a good numbers on high-resource language, you also need multi-objective, at least translation task. Ideally speakerid objective too.

Good point. It would be great if you know some experiment results or papers which use multi-objective to fine-tune whisper.

marcoyang1998 · 2024-02-20T09:21:10Z

Have other people reported similar observations when fine-tuning Whisper on WenetSpeech?

Similarly, we experienced severe deletion errors while training Zipformer on WenetSpeech (see #1130), could this be the problem of the dataset?

JinZr · 2024-03-07T07:11:11Z

thanks! i'll look into it.

yuekaizhang · 2024-03-07T07:19:25Z

Have other people reported similar observations when fine-tuning Whisper on WenetSpeech?

Similarly, we experienced severe deletion errors while training Zipformer on WenetSpeech (see #1130), could this be the problem of the dataset?

@marcoyang1998 You're correct, see wenet-e2e/WenetSpeech#54.

One solution is to retrain with the new labels provided by wenet-e2e/WenetSpeech#54, but I'm thinking that for such colloquial scenarios, there might be a better way to evaluate, such as reducing the weight of errors related to modal particles.

When people want to use ASR to add subtitles to their videos, it's clear that it would be more helpful if the model could automatically omit these colloquial words.

modified a default param.

removed unnecessary comments

minor updates

egs/speechio/ASR/zipformer/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24

JinZr · 2024-03-07T08:52:26Z

hi, thank you for your work!

i went through the pr and left a comment and few modifications, if you feel like those are proper then i think this pr is ready to merge.

JinZr

LGTM, waiting for CI tests to be done

thanks!

yuekaizhang added 30 commits January 31, 2024 14:02

add whisper fbank for wenetspeech

e43c4da

add whisper fbank for other dataset

315175a

add str to bool

046e071

add decode for wenetspeech

72c9d01

add requirments.txt

38f5f45

add original model decode with 30s

d1b0104

test feature extractor speed

aa7b17e

add aishell2 feat

f4cf9fb

change compute feature batch

fd77c57

fix overwrite

e46e9b7

fix executor

f66b266

regression

08db305

add kaldifeatwhisper fbank

af29455

fix io issue

df54121

parallel jobs

cf85019

use multi machines

baa7c5f

add wenetspeech fine-tune scripts

e1a55b9

add monkey patch codes

e49534f

remove useless file

ad796d9

fix subsampling factor

b76cd65

fix too long audios

1600f7d

add remove long short

bb07b65

add remove long short

c19891e

fix whisper version to support multi batch beam

341c29e

decode all wav files

d8a329e

remove utterance more than 30s in test_net

4826f08

only test net

955d16e

only test net

97aa482

using soft links

ff75cf6

add kespeech whisper feats

6fd14d2

xingchensong mentioned this pull request Feb 21, 2024

[examples] Initial whisper results on wenetspeech wenet-e2e/wenet#2356

Merged

yuekaizhang added 11 commits February 22, 2024 15:55

add manifests for whisper

910e5db

change to licomchunky writer

0212266

add missing option

f893ae2

decrease cpu

5a62723

add speed perturb for kespeech

73e5cae

fix kespeech speed perturb

fa58ed2

add dataset

73a7687

load checkpoint from specific path

50b575a

add speechio

b422e7a

add speechio results

a00c0c5

Merge branch 'master' into whisper_zh

19e21ba

yuekaizhang changed the title ~~[WIP] whisper large fine-tuning on wenetspeech, mutli-hans-zh~~ Whisper large fine-tuning on wenetspeech, mutli-hans-zh Mar 7, 2024

yuekaizhang requested a review from JinZr March 7, 2024 07:10

JinZr added 6 commits March 7, 2024 16:35

Update train.py

1c6a6a2

modified a default param.

Update compute_fbank_wenetspeech_splits.py

4cca65a

removed unnecessary comments

Update train.py

211ce4c

minor updates

Update prepare.sh

262792f

minor updates

Update compute_fbank_wenetspeech_splits.py

e96f533

Update train.py

21af721

JinZr reviewed Mar 7, 2024

View reviewed changes

egs/speechio/ASR/zipformer/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24 Outdated Show resolved Hide resolved

JinZr and others added 2 commits March 7, 2024 17:25

fixed a formatting issue

ab57bb5

remove submodule

2c54d6b

JinZr approved these changes Mar 7, 2024

View reviewed changes

JinZr merged commit 5df24c1 into k2-fsa:master Mar 7, 2024
108 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper large fine-tuning on wenetspeech, mutli-hans-zh #1483

Whisper large fine-tuning on wenetspeech, mutli-hans-zh #1483

yuekaizhang commented Jan 31, 2024 •

edited by JinZr

Loading

yuekaizhang commented Feb 20, 2024

marcoyang1998 commented Feb 20, 2024

JinZr commented Mar 7, 2024

yuekaizhang commented Mar 7, 2024

JinZr commented Mar 7, 2024

JinZr left a comment

Whisper large fine-tuning on wenetspeech, mutli-hans-zh #1483

Whisper large fine-tuning on wenetspeech, mutli-hans-zh #1483

Conversation

yuekaizhang commented Jan 31, 2024 • edited by JinZr Loading

yuekaizhang commented Feb 20, 2024

marcoyang1998 commented Feb 20, 2024

JinZr commented Mar 7, 2024

yuekaizhang commented Mar 7, 2024

JinZr commented Mar 7, 2024

JinZr left a comment

Choose a reason for hiding this comment

yuekaizhang commented Jan 31, 2024 •

edited by JinZr

Loading