M3 GPU overloaded? 

### Hi, fun fact I got an M3 Pro MacBook  to run whisper.cpp super speedy on. 

I keep getting this error whenever I use GPU:

```
ggml_metal_graph_compute: command buffer 3 failed with status 3
GGML_ASSERT: ggml-metal.m:1611: false
```

I may be doing something wrong, as far as I am aware this means the GPU could not handle the load? Could it be the code that I am running? Here is the output of it loading the model:
------------------------------------------------------------------
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 28991.03 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =   156.68 MB, (  162.66 / 28991.03)
whisper_model_load:    Metal buffer size =   156.67 MB
whisper_model_load: model size    =  156.58 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/ronangrant/Workshop/whisper_cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 28991.03 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    16.52 MB, (  179.18 / 28991.03)
whisper_init_state: kv self size  =   16.52 MB
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    18.43 MB, (  197.61 / 28991.03)
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =     0.02 MB, (  197.62 / 28991.03)
whisper_init_state: compute buffer (conv)   =    5.67 MB
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =     0.02 MB, (  197.64 / 28991.03)
whisper_init_state: compute buffer (cross)  =    4.71 MB
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =     0.02 MB, (  197.66 / 28991.03)
whisper_init_state: compute buffer (decode) =   96.41 MB
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =     4.05 MB, (  201.70 / 28991.03)
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =     3.08 MB, (  204.78 / 28991.03)
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    94.78 MB, (  299.57 / 28991.03)
Model initialized successfully.

**And just for reference so you can see how I am using whisper.cpp here is a little bit of code that shows how I utilize it to help troubleshoot:**


```
#include "whisper.h"
#include "ggml.h"
#include <iostream>
#include <vector>
#include <csignal>
#include <climits>
#include <cstring>

// Global context for Whisper model
whisper_context* ctx;

// Initialization of Whisper Model
whisper_context* initialize_model() {
    whisper_context_params cparams = whisper_context_default_params();
    cparams.use_gpu = true; // Use GPU if available
    ctx = whisper_init_from_file_with_params("models/ggml-base.en.bin", cparams);
    if (ctx == nullptr) {
        std::cerr << "Failed to initialize Whisper context" << std::endl;
        return nullptr;
    }
    std::cout << "Model initialized successfully." << std::endl;
    return ctx;
}

// Unloading the Whisper Model
void unload_model(whisper_context* ctx) {
    if (ctx != nullptr) {
        whisper_free(ctx);
        std::cout << "Model unloaded successfully." << std::endl;
    }
}

// Function to transcribe audio chunks using Whisper
void transcribe_chunk(whisper_context* ctx, const std::vector<int16_t>& int_samples, bool use_context, TranscriptionCallback callback) {
    std::vector<float> float_samples(int_samples.size());
    for(size_t i = 0; i < int_samples.size(); ++i) {
        float_samples[i] = int_samples[i] / static_cast<float>(INT16_MAX);
    }

    whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    int result = whisper_full(ctx, wparams, float_samples.data(), int_samples.size());
    if (result != 0) {
        std::cerr << "Failed to process audio data. Error code: " << result << std::endl;
        return;
    }

    int n_segments = whisper_full_n_segments(ctx);
    for (int i = 0; i < n_segments; ++i) {
        const char* text = whisper_full_get_segment_text(ctx, i);
        callback(text);
    }
}

// Main function demonstrating the use of Whisper
int main() {
    ctx = initialize_model();
    if (!ctx) {
        std::cerr << "Failed to initialize Whisper model" << std::endl;
        return -1;
    }

    // .....

    unload_model(ctx);
    return 0;
}
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M3 GPU overloaded? #1630

Hi, fun fact I got an M3 Pro MacBook to run whisper.cpp super speedy on.

I may be doing something wrong, as far as I am aware this means the GPU could not handle the load? Could it be the code that I am running? Here is the output of it loading the model:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

M3 GPU overloaded? #1630

Description

Hi, fun fact I got an M3 Pro MacBook to run whisper.cpp super speedy on.

I may be doing something wrong, as far as I am aware this means the GPU could not handle the load? Could it be the code that I am running? Here is the output of it loading the model:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions