Replies: 4 comments 4 replies
-
I saw the demo with command recognition. How do you trigger events after a command? Is there an intent handler model built in? |
Beta Was this translation helpful? Give feedback.
-
Hi! I've built version for M1 as in instruction but after all I have my stream translated from original language to English. And I cannot switch it off. Is here some ideas how to? |
Beta Was this translation helpful? Give feedback.
-
can you make an example of the word time stamp in swiftUI ?? I'm totally lost in c++ |
Beta Was this translation helpful? Give feedback.
-
Is there any current or planned method of setting up a file to use as an additional dictionary? |
Beta Was this translation helpful? Give feedback.
-
Updated roadmap as Github Project
https://github.com/users/ggerganov/projects/7
Roadmap (old)
In decreasing priority:
Decoding strategies
Try to achieve at least parity with the OpenAI implementation
Target release: v1.1.0
Memory usage reduction
This will allow wider application on low-memory devices. Should be possible to cut memory usage in half with a few simple changes in
ggml
Target release: v1.2.0
Core ML support
This will allow to utilize the Apple Neural Engine for very efficient inference of the model
Target release: v1.3.0
Q4 / Q5 / Q8 integer quantization
Added via Integer quantisation support #540
Target release: v1.4.0
GPU support
Partial CUDA support via cuBLAS
Target release: v1.4.0
Documentation of ggml
Hopefully this leads to more contributions
Diarization
Very wanted feature, but very difficult to achieve. Interesting to explore and experiment
Low-power mode
This has the potential for some performance improvements too
F.A.Q.
Is
whisper.cpp
faster or slower thanPyTorch CPU
?The performance should be comparable. At the time of writing, the performance on Apple Silicon when using
whisper.cpp
is better since we utilise FP16 + Accelerate framework, whilePyTorch
not yet. But this will soon change.In general, it is not very easy to make a proper benchmark between the 2 implementations. For more information, read the following comment: Benchmark results #89 (comment)
Should I use
whisper.cpp
in my project?whisper.cpp
is a hobby project. It does not strive to provide a production ready implementation. The main goals of the implementation is to be educational, minimalistic, portable, hackable and performant. There are no guarantees that the implementation is correct and bug-free and stuff can break at any point in the future. Support and updates will depend mostly on contributions, since with time I will move on and won't dedicate too much time on the project.If you plan to use
whisper.cpp
in your own project, keep in mind the above.My advice is to not put all your eggs into the
whisper.cpp
basket.How can I contribute?
Will
ggml
/whisper.cpp
support CUDA / GPU?One of the main goals of this implementation is to be very minimalistic and be able to run it on a large spectrum of hardware. The existing CPU-only implementation achieves this goal - it is bloat-free and very simple. I think it also has some educational value. Of course, not taking advantage of modern GPU hardware is a huge drawback in terms of performance. However, adding a dependency on a certain GPU framework will tie the project with the corresponding hardware and will introduce some extra complexity.
With that said, adding GPU support to the project is low priority.
In any case, it would not be too difficult to add initial support. The main thing that needs to be offloaded to the GPU is the
GGML_OP_MUL_MAT
operator:whisper.cpp/ggml.c
Lines 6231 to 6234 in c71363f
This is where more than 90% of the computation time is currently spent. Also, I don't think it's necessary to offload the entire model to the GPU. For example, the 2 convolution layers at the start of the Encoder can easily remain on the CPU as they are not very computationally heavy. Not uploading the full model to VRAM will make it require less memory and thus make it compatible with more video cards.
Another candidate GPU framework that will likely be supported in the future is Apple's Metal Performance Shaders (MPS). Currently,
ggml
supports Apple's Accelerate framework and I really like how seamlessly it integrates in the project - both on MacOS and iOS. It does not feel like a third-party dependency at all and that is why I think MPS support can be added in a similar way. In theory, it will help to utilize the GPU on Apple devices and could potentially lead to some additional performance improvement. Also, the unified memory model of modern Apple Silicon devices allows to seamlessly share the model weights and the data embeddings between the CPU and the GPU, which is not the case for CUDA. So far, my initial experiments haven't shown any benefit of using MPS for the transformer inference, but maybe some more work is needed.Edit: Sample Metal support has been demonstrated here: Metal support #127
Edit2: There is a promising CUDA support in the works through NVBLAS: Experiments with GPU CUDA acceleration...sort of #220
Edit3: CUDA support via cuBLAS has been added: Add CUDA support via cuBLAS #834
Notable
whisper.cpp
discussionsBeta Was this translation helpful? Give feedback.
All reactions