-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance penalty w.r.t. whisper.cpp #74
Comments
I was able to reproduce this, and am working on it with @tazz4843 |
At this point the only thing I have that could be the difference is that by default whisper-rs is using v1.4.2 of whisper.cpp, while you're likely using git master upstream for these tests. Seems that upgrading whisper-rs's version to git master speeds it up somewhat, but I don't have Apple Silicon myself to test on so I can't do much myself in terms of poking around. |
My test isn't the same but runs 2 seconds slower after updating whisper-rs to use whisper.cpp master |
I've also pulled main for whisper.cpp. I had to edit 2 function signatures in /// Get the ID of the translate task token.
///
/// # C++ equivalent
/// `whisper_token whisper_token_translate ()`
pub fn token_translate(ctx: *mut whisper_context) -> WhisperToken {
unsafe { whisper_rs_sys::whisper_token_translate(ctx) }
}
/// Get the ID of the transcribe task token.
///
/// # C++ equivalent
/// `whisper_token whisper_token_transcribe()`
pub fn token_transcribe(ctx: *mut whisper_context) -> WhisperToken {
unsafe { whisper_rs_sys::whisper_token_transcribe(ctx) }
} Only slight improvements:
|
@wdoppenberg do you have any guesses as to why there's such a big performance discrepancy? |
Given I don't have Apple Silicon myself to test on, I can't do much to help with this besides suggest x86 instead. Hopefully someone else can figure it out. |
I have M1 and M2 and can try it out if someone provides a repo with a test case that can be run. right now, for the canonical "your country" jfk test case, whisper-rs and whisper.cpp are the same. I put a timer around whisper.cpp I put a timer around C code changes:
whisper-rs:
I'm using the set_mel api but there's no reason to think pcm audio will be slower rust vs c. If someone can provide a test case that can be run easily I can try them out |
Still haven't been able to find the issue. When examining the call tree I find that, as one might expect, almost all calls are part of the It almost feels like the whisper.cpp lib is not compiled with optimization flags enabled, which is not the case ofcourse. |
think it's your machine? |
I have found the issue: In my script, I used Using |
First of all, thank you for your work creating a safe wrapper around
whisper.cpp
.As mentioned in #73, the performance of
whisper-rs
is quite poor compared to the reference implementation. I'll attempt to demonstrate below.Setup
I'm using an M2 Max Macbook Pro with 64GB of (shared) memory. My goal is to run a web server with CoreML enabled, but if necessary I can run the tests with CPU only later. I'll attach output generated by
flamegraph
.Rust script
Click me
Data
For testing, I've converted a short JFK speech to WAV. See this link. Converting to WAV is done using
ffmpeg
:Results
I won't do averages of iterations since the differences are quite clear. Furthermore I've run both scripts before to ensure that the CoreML model is properly compiled for my architecture. I can confirm that the model loading step is not the issue. The chosen model is
ggml-medium.bin
. The commands used are, given that you have compiledwhisper.cpp
&whisper-rs
and are in the root of each repository, as follows:sudo time flamegraph -- ./main -m models/ggml-medium.bin -t 8 -f jfk.wav
sudo time flamegraph -- target/release/whisper-rs-cli
CoreML enabled
whisper-rs
whisper.cpp
Please let me know what you think and where I can help out. Admittedly I'm a bit inexperienced with Rust but I'd love to learn, especially solving such an issue.
The text was updated successfully, but these errors were encountered: