Skip to content

M3 GPU overloaded?  #1630

@ronangrant

Description

@ronangrant

Hi, fun fact I got an M3 Pro MacBook to run whisper.cpp super speedy on.

I keep getting this error whenever I use GPU:

ggml_metal_graph_compute: command buffer 3 failed with status 3
GGML_ASSERT: ggml-metal.m:1611: false

I may be doing something wrong, as far as I am aware this means the GPU could not handle the load? Could it be the code that I am running? Here is the output of it loading the model:

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 28991.03 MB
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 156.68 MB, ( 162.66 / 28991.03)
whisper_model_load: Metal buffer size = 156.67 MB
whisper_model_load: model size = 156.58 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 28991.03 MB
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 16.52 MB, ( 179.18 / 28991.03)
whisper_init_state: kv self size = 16.52 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 18.43 MB, ( 197.61 / 28991.03)
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.62 / 28991.03)
whisper_init_state: compute buffer (conv) = 5.67 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.64 / 28991.03)
whisper_init_state: compute buffer (cross) = 4.71 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 0.02 MB, ( 197.66 / 28991.03)
whisper_init_state: compute buffer (decode) = 96.41 MB
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 4.05 MB, ( 201.70 / 28991.03)
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 3.08 MB, ( 204.78 / 28991.03)
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 94.78 MB, ( 299.57 / 28991.03)
Model initialized successfully.

And just for reference so you can see how I am using whisper.cpp here is a little bit of code that shows how I utilize it to help troubleshoot:

#include "whisper.h"
#include "ggml.h"
#include <iostream>
#include <vector>
#include <csignal>
#include <climits>
#include <cstring>

// Global context for Whisper model
whisper_context* ctx;

// Initialization of Whisper Model
whisper_context* initialize_model() {
    whisper_context_params cparams = whisper_context_default_params();
    cparams.use_gpu = true; // Use GPU if available
    ctx = whisper_init_from_file_with_params("models/ggml-base.en.bin", cparams);
    if (ctx == nullptr) {
        std::cerr << "Failed to initialize Whisper context" << std::endl;
        return nullptr;
    }
    std::cout << "Model initialized successfully." << std::endl;
    return ctx;
}

// Unloading the Whisper Model
void unload_model(whisper_context* ctx) {
    if (ctx != nullptr) {
        whisper_free(ctx);
        std::cout << "Model unloaded successfully." << std::endl;
    }
}

// Function to transcribe audio chunks using Whisper
void transcribe_chunk(whisper_context* ctx, const std::vector<int16_t>& int_samples, bool use_context, TranscriptionCallback callback) {
    std::vector<float> float_samples(int_samples.size());
    for(size_t i = 0; i < int_samples.size(); ++i) {
        float_samples[i] = int_samples[i] / static_cast<float>(INT16_MAX);
    }

    whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    int result = whisper_full(ctx, wparams, float_samples.data(), int_samples.size());
    if (result != 0) {
        std::cerr << "Failed to process audio data. Error code: " << result << std::endl;
        return;
    }

    int n_segments = whisper_full_n_segments(ctx);
    for (int i = 0; i < n_segments; ++i) {
        const char* text = whisper_full_get_segment_text(ctx, i);
        callback(text);
    }
}

// Main function demonstrating the use of Whisper
int main() {
    ctx = initialize_model();
    if (!ctx) {
        std::cerr << "Failed to initialize Whisper model" << std::endl;
        return -1;
    }

    // .....

    unload_model(ctx);
    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions