-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
CUDA error 12 : invalid pitch argument #664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for your feedback. May you want to take a look at this ggml-org/llama.cpp#1388 |
Your link does not contain the proper solution nor information. The link contains the issue was automatically gone with the latest commit in the llama.cpp which was on MAY. I do not believe Local.ai has older llama.cpp commit than MAY. |
I should make it more clearly. I mean you can follow the comment in that issue and try to get more actual information. Or you can try loading the model with |
You are misleading. As I mentioned above, the link does not contain the information what you just mentioned. I can ceate a new ticket in llama.cpp instead using your link but let's keep open this ticket until the issue is resolved here or there. |
|
try to increase your context size to 2048 to see if that works ? |
LocalAI version:
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
Environment, CPU architecture, OS, and Version:
gpu is enabled in docker-compose file.
Enabling gpu in .env
Docker image with CUDA 12
RTX 4090
96GB RAM
13700K CPU
You do not have to worry about the host OS.
CUDA is installed and have a developer's full setting.
Describe the bug
Total Used RAM: 20%
Total Used VRAM: 50%
Plus, it has one more bug that llama.cpp load the model file really slowly.
ex)
To Reproduce
Expected behavior
Logs
Additional context
The text was updated successfully, but these errors were encountered: