-
Notifications
You must be signed in to change notification settings - Fork 12k
How to run in AMD GPU with macos (with mps)? #2965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
im getting the same error. AMD Radeon Pro 5500M 4 GB |
it seem just work for metal specific gpu, like it just for m1/m2 only. i hope there will fallback mode like pytorch/workaround |
I am having the same issue, also on the Radeon Pro 5500M, but with 8 GB of RAM: AMD Radeon Pro 5500M: Chipset Model: AMD Radeon Pro 5500M I was able to successfully use the Metal backend on an earlier version of llama.cpp (albeit not with clearly improved performance over using the CPU, if I recall correctly). |
As mentioned in #2407 the following kernels seem to be the culprit as they don’t compile. IOW if I comment them the remaining kernels compile on my AMD Radeon Pro 5500M (8 GB VRAM):
|
There’s now also #3000. |
Hearted. In the same boat. Trying to get metal to work with llama with: AMD 8GB Radeon pro 5600M. No idea where to even begin to search on forums. Seen people have success with M1/M2 chips but literally nothing with AMD. I wish we could just compile the shaders, cache them, and then the shaders handle all the orders that need to be done on the GPU. I come from the Unreal Engine side of things. We have a material editor that lets us easily design functions such that the GPU does what we want. Doesn't matter M1 vs M2, AMD or NVIDIA. A universal pipeline with all the instructions --> some universal Shader code --> Specifics for GPU should be doable. |
I have an AMD Radeon Pro 5500M with only 4GB on a MacBook Pro 2.3GHz with 16GB and an Intel 8-core i9 with UHD 630. It will just run `.build/bin/simple ./models/llama-2-7b/ggml-model-q4_0.gguf' but the required GPU memory exceeds that available and so it runs incredibly slowly and eventually reports a segmentation fault even though it produces some output by apparently using the CPU RAM. Anything bigger than the -q4_0 and it refuses even to try. For some reason it reverts to German halfway through the response!
|
I have an older iMac Retina 5K, 27-inch, 2017 with Radeon Pro 570 4 GB inside. I tried to run whisper from main and it failed with the "SC compilation failure". Removing all add kernel lines with "mul_mm_" prefixes and recompiling solved the issue: diff --git a/ggml-metal.m b/ggml-metal.m
index 1139ee3..06a436c 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -251,16 +251,16 @@ @implementation GGMLMetalClass
GGML_METAL_ADD_KERNEL(mul_mat_q4_K_f32);
GGML_METAL_ADD_KERNEL(mul_mat_q5_K_f32);
GGML_METAL_ADD_KERNEL(mul_mat_q6_K_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
- GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
+ // GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
GGML_METAL_ADD_KERNEL(rope);
GGML_METAL_ADD_KERNEL(alibi_f32);
GGML_METAL_ADD_KERNEL(cpy_f32_f16); |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Uh oh!
There was an error while loading. Please reload this page.
my rx 560 actually supported in macos (mine is hackintosh macos ventura 13.4), but when i try to run llamacpp , it cant utilize mps. im already compile it with LLAMA_METAL=1 make but when i run this command:
./main -m ./models/falcon-7b-Q4_0-GGUF.gguf -n 128 -ngl 1
it error:
please make it compatible because i can run stable diffusioon on macos with this gpu (the mps is work)
The text was updated successfully, but these errors were encountered: