Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4]; #10208

Closed
a1ix2 opened this issue Nov 7, 2024 · 10 comments · Fixed by #10229
Closed
Labels
bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

Comments

@a1ix2
Copy link

a1ix2 commented Nov 7, 2024

What happened?

Trying to run a llama-server on Apple Silicon M2 running Ventura. Same error either using the latest release or building from source. I'm trying to load Llama-3.2-3B-Instruct F16 from Meta. I created the gguf using convert_hf_to_gguf.py.

$ ./llama-server -m Llama-3.2-3B-Instruct-F16.gguf --verbose

Name and Version

From source

./llama-cli --version
version: 4048 (a71d81c)
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin22.6.0

From the release

$ ./llama-cli --version
version: 4044 (97404c4)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
" UserInfo={NSLocalizedDescription=program_source:5134:17: error: zero-length arrays are not permitted in C++
    float4x4 lo[D16/NW4];
                ^~~~~~~
program_source:5391:18: note: in instantiation of function template specialization 'kernel_flash_attn_ext_vec<metal::matrix<half, 4, 4, void>, 1, &dequantize_f16, 64, 1, 32>' requested here
typedef decltype(kernel_flash_attn_ext_vec<half4x4, 1, dequantize_f16, 64>) flash_attn_ext_vec_t;
                 ^
program_source:5186:21: error: zero-length arrays are not permitted in C++
        float4x4 mq[D16/NW4];
                    ^~~~~~~
}
ggml_backend_metal_device_init: error: failed to allocate context
llama_new_context_with_model: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
srv    load_model: failed to load model, '/Users/username/dev/RAG/Llama-3.2-3B-Instruct-F16.gguf'
main: exiting due to model loading error
@a1ix2 a1ix2 added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Nov 7, 2024
@stefanb
Copy link

stefanb commented Nov 8, 2024

According to llama.cpp pull requests in Homebrew the problem started appearing in Homebrew/homebrew-core#196827, between tags b4034 and b4038

Diff: b4034...b4038

@a1ix2
Copy link
Author

a1ix2 commented Nov 8, 2024

I can confirm that b4034 works, but b4036 throws the same error.

@stefanb
Copy link

stefanb commented Nov 8, 2024

I can confirm that b4034 works, but b4036 throws the same error.

Which narrows down the problematic diff to b4034...b4036

@a1ix2
Copy link
Author

a1ix2 commented Nov 8, 2024

The bug was introduced in a1eaf6. Previous commit b8deef works.

@stefanb
Copy link

stefanb commented Nov 9, 2024

Commit a1eaf6a was from merging

cc @ggerganov, any clues?

@ggerganov
Copy link
Owner

@stefanb Should be fixed now. Let me know if the issue persists.

@stefanb
Copy link

stefanb commented Nov 9, 2024

Tnx @ggerganov, waiting for the next tag (>b4056) containing the fix.

@stefanb
Copy link

stefanb commented Nov 9, 2024

@ggerganov, thanks, seems to be fixed 🎉 in

@a1ix2
Copy link
Author

a1ix2 commented Nov 9, 2024

Can confirm, works on my M2 Air! Thank you so much! Still impressed how fast Metal is even for reasonably sized models on a rather low-end laptop.

@eugeniosegala
Copy link
Contributor

It looks like this problem is back?

I'm experiencing it as well, but I also noticed someone else opening a new issue: #10696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants