Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool is super slow / runs forever #30

Open
03l54rd1n3 opened this issue Jan 9, 2024 · 10 comments
Open

Tool is super slow / runs forever #30

03l54rd1n3 opened this issue Jan 9, 2024 · 10 comments

Comments

@03l54rd1n3
Copy link

I'm trying to transcribe the audio of a 45s mp3 of the audio of a YouTube Short.
I'm doing it like this:

from pywhispercpp.model import Model
model = Model('base.en', print_realtime=False, print_progress=True, n_threads=6)
segments = model.transcribe(short_audio_file, speed_up=True, new_segment_callback=print)

It runs forever, doesn't end and this is all the output I get. Then it just keeps running for seemingly nothing. CPU is at 100%:

[2024-01-09 23:28:50,941] {utils.py:38} INFO - No download directory was provided, models will be downloaded to [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,943] {utils.py:46} INFO - Model base.en already exists in [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,944] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_no_state: loading model from '/home/marius/.local/share/pywhispercpp/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
[2024-01-09 23:28:52,186] {model.py:130} INFO - Transcribing ...

Any ideas what could be wrong or how to improve the speed? Thanks for any help. I appreciate it. This is the most promising of the python bindings for whisper.cpp as the others don't even build anymore...

@absadiki
Copy link
Owner

Seems like the model is loaded successfully, so it's weird why it runs forever!
Is the short_audio_file var holds the path to your mp3 file as a str ? Have you tried other files and you always run into the same issue ?

@03l54rd1n3
Copy link
Author

Hi, thanks for your reply. Yes, it's a str that holds the path. I haven't tried another file, but honestly it's a pretty basic mp3 of just spoken text with no additional sounds.

You can try yourself, I attached the file (It's zipped, so I can upload it to GitHub. It's not from my own video. It's a random Short from YouTube. So enjoy some Dragonball content)
input_short.zip

@absadiki
Copy link
Owner

Hi @03l54rd1n3,
Thanks for providing the file, it took less than 4s on my machine to generate the results :

{model.py:133} INFO - Inference time: 3.481 s
[t0=0, t1=242, text=Why does Vegeta always hold his left arm?, t0=242, t1=528, text=Vegeta has multiple poses that are very distinctive of him,, t0=528, t1=778, text=for instance the infamous self-pointing thumb., t0=778, t1=1194, text=A very different one however, is that in which he holds his left arm in pain., t0=1194, t1=1604, text=Vegeta has gone through a lot of different battles and has sustained a crazy amount of injuries., t0=1604, t1=2158, text=But for some reason most of the time he always ends up holding his left arm as if he had some sort of chronic pain., t0=2158, t1=2504, text=A lot of people thought back then that this was because of Andrew at 18,, t0=2504, t1=2690, text=who really did a number on his left arm., t0=2690, t1=2784, text=Nevertheless,, t0=2784, t1=3168, text=it is possible to see Vegeta holding his left arm already in the namics saga., t0=3168, t1=3578, text=This implies that if Vegeta really does have some sort of chronic injury in his left arm,, t0=3578, t1=3800, text=then it must be previous to the android saga., t0=3800, t1=4268, text=Also, this is an injury that no sends a beans or dragon ball resurrection has been able to heal,, t0=4268, t1=4564, text=so whatever it is, it must be deeply rooted within his body.]

So there is something wrong with your installation.

Do you have ffmpeg installed ?

@03l54rd1n3
Copy link
Author

Hi, thank you for your reply. Yes it's installed through apt. And I installed your tool through pip. Wonder what it is then... I'll check the whisper.cpp requirements as well...

@absadiki
Copy link
Owner

Yes, try to compile and run whisper.cpp first and let me know if that works.

@03l54rd1n3
Copy link
Author

OK, your tool works fine from the CLI (pwcpp). Original whisper.cpp also works. Seems like the unexpected behavior is just in python (script file or notebook). Any idea why it only happens there?

@03l54rd1n3
Copy link
Author

Correction, it happens in python when using the n_threads argument. Without that it works. The tools seems to deadlock. I'm on linux if that is relevant for you.

@absadiki
Copy link
Owner

I only use Linux as well and this never happened.
But how many threads does your CPU support ?

@03l54rd1n3
Copy link
Author

good question. I have 4 cores. It's some 7th gen Intel i7, not the best, but with 16GB of RAM, the laptop still manages most tasks pretty well.

I just tried a couple of times again. In the python script it actually now works with n_threads set to 2 or 4. In the notebook with it set to 1 or 2, sometimes I get to transcribing, but no results. Sometimes it gets locked before that at kv cross size.

@absadiki
Copy link
Owner

Yeah it's good, but obviously you cannot go above your resources, so n_threads should not exceed 4 (which is the default by the way).
So as long as it's running in a script then everything is good, you have to check your Jupyter notebook environment, I have also re-checked now in colab notebooks and it's working without any problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants