-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL: CUDA error 2 at ggml-cuda.cu:359: out of memory (Fix found) #1230
Comments
Looks like it's failing to allocate host pinned memory. I will add a patch to revert to normal pageable memory when this happens. In the meanwhile, removing the |
Okay, i did both things:
|
That's weird, it looks like it is failing to allocate any amount of host pinned memory. It should still be solved by reverting to normal memory when the host pinned malloc fails, but you will lose some performance. |
Fixed with #1233 |
Okay i found some more info to this topic: Somehow this is a common issue. |
In case this is useful, I am using Ubuntu 22.04 in WSL2 under Windows 11, with a RTX 3080 and latest drivers. That works for me. |
No idea why... it's fresh win 11 with Ubuntu 22.04... |
Tried 20.04 same result. Tried CUDA 11.8 same again. No idea what combination at what order can give me ability to pin memory using WSL |
NVIDIA is very vague about the limits, but they suggest that usually on Windows there is a limit of pinned memory equal to 50% of the total system memory, and this limit is likely to be lower under WSL2. |
I believe 64GB should be enough in general for 13B T_T. |
Okay i tried this:
then i did And result is:
So it seems i successfully allocated 100 MB to GPU. Good start (i guess?), i'll try larger size.
At this point i wonder if it really does it, but okay. |
AAAAAAAAAAAAAAAAAAAAAA
I believe the thing that did trick for me is:
Also i installed CUDA via
and i did it before installing miniconda. Some info for anyone who may fight this in the future:
|
I have similar problem. more : abetlen/llama-cpp-python#229 llama_model_load_internal: format = ggjt v2 (latest) |
I honestly can't get this to work, I tried everything you did, reinstalled WSL and cuda like twice, still the same error, Here's my nvcc and nvidia-smi:
Any clue as to what could be wrong? literally loading the smallest model, 20 layers on the 7B model, and no luck (PytorchEnv) yuicchi@DESKTOP-DJ3R5OF:/mnt/d/Yuicchi Text Model/llama.cpp$ ./main -ngl 20 --ctx_size 2048 -n 2048 -c 2048 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --batch_size 512 --repeat_penalty 1.17647 --seed 1685501956 --model "./models/7B/ggml-model-q4_0.bin" --threads 8 --n_predict 4096 --color --prompt system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | Write out a detailed step by step method on how to create a websiteCUDA error 2 at ggml-cuda.cu:565: out of memory |
Due to current cuda bug I think you need to set no pinned for enviroment variables. Command for it: "export GGML_CUDA_NO_PINNED=1" |
Thank you so much for that! It worked. Is there any place I can look into for this bug? What exactly might be going wrong here? |
@Celppu how do you do that . where to execute that command |
Solution: #1230 (comment)
UPD:
Confirmed working just fine on Windows.
Issue below happened only on WSL.
#1207
First i pull and clean
Build fresh with cuBLAS
Trying to load model that worked before update
I haven't updated my libllama.so for llama-cpp-python yet, so it uses previous version, and works with this very model just fine. Smth happened.
RTX 3050 8GB
UPD 2:
Issue persists on WSL. I did full clean and yet it doesn't work after being built with current version.
UPD 3:
I found some years old version llama.cpp and did exactly same thing and everything worked fine. So i guess it's not me be especially dumb today.
The text was updated successfully, but these errors were encountered: