Unable to transcribe audio when using a fine-tuned whisper medium model #129

yilmazay74 · 2023-10-30T13:43:32Z

yilmazay74
Oct 30, 2023

Hi All,
We have been using whisper for a while. Recently we started to generate our own finetuned models by adding customized audio and transcription data. We can use these new finetuned models with standart whisper inference scripts without problems.
However ...
Recently we wanted also to have word by word timestamps in the results, so we wanted to use whisper-timestamped.
With Whisper-timestamped, we can transcribe audios by using pretrained whisper models (e.g. whisper-medium-model) without problems .
However, whenever we want to use our own fine tuned models it throws exceptions.
If I use the load_model( ) method it gives the following exception:
In case of: (using model = whisper.load_model( )):
File "/home/tekrom/components/whisper/service/init.py", line 29, in create_app
model = load_model("service/models/medium-v4/pytorch_model.bin", device="cpu")
File "/home/tekrom/components/whisper/service/transcribe.py", line 2191, in load_model
whisper_model.load_state_dict(hf_state_dict)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
Unexpected key(s) in state_dict: "proj_out.weight".

If I use WhisperForConditionalGeneration class's from_pretrained() method it gives the following expception:
In case of: ( using model = WhisperForConditionalGeneration.from_pretrained("service/models/medium-v4")
File "/home/tekrom/components/whisper/service/init.py", line 138, in asr3
left_converted_result = processChannel(get_audio_tensor(audio_left))
File "/home/tekrom/components/whisper/service/init.py", line 176, in processChannel
wResult = transcribe(model,
File "/home/tekrom/components/whisper/service/transcribe.py", line 226, in transcribe_timestamped
input_stride = N_FRAMES // model.dims.n_audio_ctx
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'WhisperForConditionalGeneration' object has no attribute 'dims'

So, It looks like the model format that whisper-timestamped expecting and the one that fine-tuning generated are different,
and so, it cannot find some attributes and it fails.
I would appreciate if someone guides me about how to resolve this issue.
Thanks in advance.

Y. Ay

yilmazay74 · 2023-10-30T13:48:06Z

yilmazay74
Oct 30, 2023
Author

As an additional comment to this topic:
how come whisper-timestamped is expecting an attribute dims in the model? Why the fine tuning trainer script does not produce that 'dims' attribute in the generated checkpoint. Actually I am using the last saved model checkpoint. Is there anything else that I need to do, for this checkpoint so that whisper-timestamped can load and transcribe with it without problems?

0 replies

Jeronymous · 2023-10-31T10:30:42Z

Jeronymous
Oct 31, 2023
Maintainer

See discussion in issue #130

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to transcribe audio when using a fine-tuned whisper medium model #129

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Unable to transcribe audio when using a fine-tuned whisper medium model #129

yilmazay74 Oct 30, 2023

Replies: 2 comments

yilmazay74 Oct 30, 2023 Author

Jeronymous Oct 31, 2023 Maintainer

yilmazay74
Oct 30, 2023

yilmazay74
Oct 30, 2023
Author

Jeronymous
Oct 31, 2023
Maintainer