Unable to transcribe audio when using a fine-tuned whisper medium model #129
Replies: 2 comments
-
As an additional comment to this topic: |
Beta Was this translation helpful? Give feedback.
0 replies
-
See discussion in issue #130 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
We have been using whisper for a while. Recently we started to generate our own finetuned models by adding customized audio and transcription data. We can use these new finetuned models with standart whisper inference scripts without problems.
However ...
Recently we wanted also to have word by word timestamps in the results, so we wanted to use whisper-timestamped.
With Whisper-timestamped, we can transcribe audios by using pretrained whisper models (e.g. whisper-medium-model) without problems .
However, whenever we want to use our own fine tuned models it throws exceptions.
If I use the load_model( ) method it gives the following exception:
In case of: (using model = whisper.load_model( )):
File "/home/tekrom/components/whisper/service/init.py", line 29, in create_app
model = load_model("service/models/medium-v4/pytorch_model.bin", device="cpu")
File "/home/tekrom/components/whisper/service/transcribe.py", line 2191, in load_model
whisper_model.load_state_dict(hf_state_dict)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
Unexpected key(s) in state_dict: "proj_out.weight".
If I use WhisperForConditionalGeneration class's from_pretrained() method it gives the following expception:
In case of: ( using model = WhisperForConditionalGeneration.from_pretrained("service/models/medium-v4")
File "/home/tekrom/components/whisper/service/init.py", line 138, in asr3
left_converted_result = processChannel(get_audio_tensor(audio_left))
File "/home/tekrom/components/whisper/service/init.py", line 176, in processChannel
wResult = transcribe(model,
File "/home/tekrom/components/whisper/service/transcribe.py", line 226, in transcribe_timestamped
input_stride = N_FRAMES // model.dims.n_audio_ctx
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'WhisperForConditionalGeneration' object has no attribute 'dims'
So, It looks like the model format that whisper-timestamped expecting and the one that fine-tuning generated are different,
and so, it cannot find some attributes and it fails.
I would appreciate if someone guides me about how to resolve this issue.
Thanks in advance.
Y. Ay
Beta Was this translation helpful? Give feedback.
All reactions