-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Hi, fun fact I got an M3 Pro MacBook to run whisper.cpp super speedy on.
I keep getting this error whenever I use GPU:
ggml_metal_graph_compute: command buffer 3 failed with status 3
GGML_ASSERT: ggml-metal.m:1611: false
I may be doing something wrong, as far as I am aware this means the GPU could not handle the load? Could it be the code that I am running? Here is the output of it loading the model:
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 28991.03 MB
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 156.68 MB, ( 162.66 / 28991.03)
whisper_model_load: Metal buffer size = 156.67 MB
whisper_model_load: model size = 156.58 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 28991.03 MB
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 16.52 MB, ( 179.18 / 28991.03)
whisper_init_state: kv self size = 16.52 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 18.43 MB, ( 197.61 / 28991.03)
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.62 / 28991.03)
whisper_init_state: compute buffer (conv) = 5.67 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.64 / 28991.03)
whisper_init_state: compute buffer (cross) = 4.71 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.66 / 28991.03)
whisper_init_state: compute buffer (decode) = 96.41 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 4.05 MB, ( 201.70 / 28991.03)
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 3.08 MB, ( 204.78 / 28991.03)
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 94.78 MB, ( 299.57 / 28991.03)
Model initialized successfully.
And just for reference so you can see how I am using whisper.cpp here is a little bit of code that shows how I utilize it to help troubleshoot:
#include "whisper.h"
#include "ggml.h"
#include <iostream>
#include <vector>
#include <csignal>
#include <climits>
#include <cstring>
// Global context for Whisper model
whisper_context* ctx;
// Initialization of Whisper Model
whisper_context* initialize_model() {
whisper_context_params cparams = whisper_context_default_params();
cparams.use_gpu = true; // Use GPU if available
ctx = whisper_init_from_file_with_params("models/ggml-base.en.bin", cparams);
if (ctx == nullptr) {
std::cerr << "Failed to initialize Whisper context" << std::endl;
return nullptr;
}
std::cout << "Model initialized successfully." << std::endl;
return ctx;
}
// Unloading the Whisper Model
void unload_model(whisper_context* ctx) {
if (ctx != nullptr) {
whisper_free(ctx);
std::cout << "Model unloaded successfully." << std::endl;
}
}
// Function to transcribe audio chunks using Whisper
void transcribe_chunk(whisper_context* ctx, const std::vector<int16_t>& int_samples, bool use_context, TranscriptionCallback callback) {
std::vector<float> float_samples(int_samples.size());
for(size_t i = 0; i < int_samples.size(); ++i) {
float_samples[i] = int_samples[i] / static_cast<float>(INT16_MAX);
}
whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
int result = whisper_full(ctx, wparams, float_samples.data(), int_samples.size());
if (result != 0) {
std::cerr << "Failed to process audio data. Error code: " << result << std::endl;
return;
}
int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i) {
const char* text = whisper_full_get_segment_text(ctx, i);
callback(text);
}
}
// Main function demonstrating the use of Whisper
int main() {
ctx = initialize_model();
if (!ctx) {
std::cerr << "Failed to initialize Whisper model" << std::endl;
return -1;
}
// .....
unload_model(ctx);
return 0;
}