-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA an illegal memory access was encountered #1502
Comments
try using one of the smaller models.
btw I am running a GTX 1080 |
Are you using the latest version of this repo? |
yes, I update repo few minutes ago, do make clean, new build... same issue :/ |
I am having exactly the same issue with GTX 1060 while using ggml-large-v3-q5_0.bin |
Would need a sample audio and the exact command that reproduces the issue. |
|
If you apply the following patch, does it work? diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index b420330..9da239a 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -96,7 +96,7 @@
// - 7B quantum model: +100-200 MB
// - 13B quantum model: +200-400 MB
//
-//#define GGML_CUDA_FORCE_MMQ
+#define GGML_CUDA_FORCE_MMQ
// TODO: improve this to be correct for more hardware
// for example, currently fails for GeForce GTX 1660 which is TURING arch (> VOLTA) but does not have tensor cores |
I'm using ggml-large-v3.bin too, and also have this issue with Quadro P6000. after apply the patch above, still no lucky. |
Yeah, I'm sorry. Without being able to reproduce, I won't be able to fix it. It works on my old GTX 1660 and I can't rent an older GPU to test |
Try #1548, but unlikely that it will resolve the problem |
Hm, interesting. Could the rest of the people that have issues confirm that it is only v3 that does not work? |
I am hitting this with large-v1 in f16 and q5_1:
I do not hit this with q8_0 but I get gibberish. This model works fine on 4774d2f (with -arch=native for modern CUDA). Will bisect. edit: |
Before b050283 there was just naive matrix multiplication with host-device-host copy running on the GPU. This was the first commit to introduce llama-like GPU offloading of the entire graphs. But it seems that we have some bug in |
b050283 on P40 even the text output with medium model is garbage. Something is broken in the ggml-cuda.cu |
Testing on Kaggle with Nvidia P100 Latest commit as of today i.e. - f0efd02 The issue seems to affect all models probably (tested large-v3, large-v2, medium, base, tiny) in different ways. For the large models it gives CUDA 700 error and for the smaller models its gives gibberish output. I tested in Kaggle so that others without access to older gen GPUs can also reproduce the issue. For model - large-v3
Same result for large-v2
The error is not present in medium, base and tiny models but gibberish outputs all cases - Medium model-
Base model -
Tiny model -
|
From the reports so far, this appears to happen with devices of compute capability |
If you apply the following patch, does it fix the issue? diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index e80b7a7..caafbd5 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -7522,7 +7522,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const
const half alpha_f16 = 1.0f;
const half beta_f16 = 0.0f;
-#if 0
+#if 1
// use cublasGemmEx
{
for (int i13 = 0; i13 < ne13; ++i13) { |
Nope, still getting |
Any luck with the latest version on |
That fixed the issue for my GTX 1060. |
When I try to start any large model I have this error:
my GPU is 1080 Ti (11GB vRAM).
model base.en works.
openai-whisper with large-v2 works without problem using GPU.
I also tried to run quantize model... q5_0 have same problem... q4_0 start but I don't have any output :/
on dmesg:
example run:
The text was updated successfully, but these errors were encountered: