-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue running mixtrall #4502
Comments
Rebuild from source with LLAMA_CUDA_FORCE_MMQ=on will probably fix it. |
I removed my build directory and ran the below commands but I am still getting the same error. Did I do something wrong? mkdir build |
Looks right to me. Make sure when you run and it prints out this flag that
it is correct.
…On Sat, Dec 16, 2023, 9:17 PM Adam ***@***.***> wrote:
I removed my build directory and ran the below commands but I am still
getting the same error. Did I do something wrong?
mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on
cmake --build . --config Release
—
Reply to this email directly, view it on GitHub
<#4502 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDD3ZOVDBVXXZNOKDPKIQTYJZI5DAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGAYTMMZWGE>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
I assume that the flag was on the same line as cmake, and it just got
folded by formatting.
On Sat, Dec 16, 2023, 10:12 PM pensive introvert <
***@***.***> wrote:
… Looks right to me. Make sure when you run and it prints out this flag that
it is correct.
On Sat, Dec 16, 2023, 9:17 PM Adam ***@***.***> wrote:
> I removed my build directory and ran the below commands but I am still
> getting the same error. Did I do something wrong?
>
> mkdir build
> cd build
> cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on
> cmake --build . --config Release
>
> —
> Reply to this email directly, view it on GitHub
> <#4502 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABDD3ZOVDBVXXZNOKDPKIQTYJZI5DAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGAYTMMZWGE>
> .
> You are receiving this because you commented.Message ID: <ggerganov/llama
> .***@***.***>
>
|
Can reproduce the issue. I have the same error, but not on the same line of the cuda file. I have a config of 2*P40 and I built with the instructions of @phalexo. The flag is on on the cmd, and the error only occurs when I add some context to the prompt (if I run the file with 1 or 2 sentences, it works well. This error only occurs on mixtral model, not on the llama2 models. |
Yes I also have 2p40s |
P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell).
We don't have tensor cores.
When you launch "main" make certain the displayed flags indicate that
tensor cores are not being used.
make puts "main" in llama.cpp folder and cmake in build/bin. Just check
which main you are running.
My total VRAM over 4 GPUs is about 49GiB
Are your P40s 24GiB each?
You should be able to run it.
Try this command line, maybe there is something with flags.
bin/main -ngl 33 -m
/opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf
--color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p
"<s>[INST] You have an IQ of 200 and love puzzles. [/INST] "
…On Sun, Dec 17, 2023 at 9:44 AM Adam ***@***.***> wrote:
Yes I also have 2p40s
@phalexo <https://github.com/phalexo> yes it was on the same line. It
must have folded when I posted.
—
Reply to this email directly, view it on GitHub
<#4502 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Tesla P40 is on the Pascal architecture. Mine do have 24GB each. When I run through the server web interface on 8000 I can talk to the model. When I run using API with autogen or other I get the error. |
When I send long context into the web interface I get the error also. It does seem to be the context size. CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:7970: unspecified launch failure |
Ok, Volta I think is the first architecture with tensor cores.
…On Sun, Dec 17, 2023, 12:08 PM Adam ***@***.***> wrote:
Tesla P40 is on the Pascal architecture. Mine do have 24GB each.
—
Reply to this email directly, view it on GitHub
<#4502 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDD3ZI3WM3IFVUIA45DKP3YJ4RHPAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGIZDMMBWGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just to clarify I am talking about Mixtral 7x8 MOE |
Yes, I got it. But the error was affecting small models too, it was not
limited to Mixtral.
…On Sun, Dec 17, 2023, 12:27 PM Adam ***@***.***> wrote:
P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell).
We don't have tensor cores. When you launch "main" make certain the
displayed flags indicate that tensor cores are not being used. make puts
"main" in llama.cpp folder and cmake in build/bin. Just check which main
you are running. My total VRAM over 4 GPUs is about 49GiB Are your P40s
24GiB each? You should be able to run it. Try this command line, maybe
there is something with flags. bin/main -ngl 33 -m
/opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf
--color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p "[INST]
You have an IQ of 200 and love puzzles. [/INST] "
… <#m_5295942860062125527_>
On Sun, Dec 17, 2023 at 9:44 AM Adam *@*.*> wrote: Yes I also have 2p40s
@phalexo <https://github.com/phalexo> https://github.com/phalexo
<https://github.com/phalexo> yes it was on the same line. It must have
folded when I posted. — Reply to this email directly, view it on GitHub
<#4502 (comment)
<#4502 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU
<https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU>
. You are receiving this because you were mentioned.Message ID: @.*>
Just to clarify I am talking about Mixtral 7x8 MOE
—
Reply to this email directly, view it on GitHub
<#4502 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDD3ZNZYAHSYNKGZITZNXTYJ4TPJAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGIZTANJUGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
got the same error, seems to be random though. |
I am getting the same. Shorter prompts do fine, going from a Q6 to Q5 also increases the amount of tokens I can input before getting the same error. Also 2x P40 |
2xP40 24GB same similary error CUDA error 719 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8019: unspecified launch failure something with context size on oobabooga chat tab I can chat with it and the context incresing, there is no problem to go over 4000, when i use the openai API and send a shorte chat/completion about 3 roles it crashes immediately |
@alienatorZ Can you edit the original post please - the line which has the uname output is generating a link to an as yet unclosed issue # 101 (from March last year!) even though it's completely unrelated.. it's just how the # and number got parsed at time of posting - ta. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
The server should serve the request and provide a response
Current Behavior
....
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 117.72 MiB
llama_new_context_with_model: VRAM scratch buffer: 114.54 MiB
llama_new_context_with_model: total VRAM used: 25324.09 MiB (model: 25145.55 MiB, context: 178.54 MiB)
Available slots:
-> Slot 0 - max context: 512
llama server listening at http://0.0.0.0:8080
{"timestamp":1702752484,"level":"INFO","function":"main","line":3093,"message":"HTTP server listening","port":"8080","hostname":"0.0.0.0"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: unspecified launch failure
current device: 1
GGML_ASSERT: /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: !"CUDA error"
Aborted (core dumped)
Environment and Context
$ lscpu
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7
10:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
16:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
16:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
16:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
20:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
21:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
2b:00.0 Non-Volatile memory controller: Intel Corporation Device f1aa (rev 03)
30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c9)
30:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
30:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
30:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
$ uname -a
Linux mojoserver 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: