issue running mixtrall #4502

alienatorZ · 2023-12-16T18:57:37Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x ] I carefully followed the README.md.
[x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The server should serve the request and provide a response

Current Behavior

....
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 117.72 MiB
llama_new_context_with_model: VRAM scratch buffer: 114.54 MiB
llama_new_context_with_model: total VRAM used: 25324.09 MiB (model: 25145.55 MiB, context: 178.54 MiB)
Available slots:
-> Slot 0 - max context: 512

llama server listening at http://0.0.0.0:8080

{"timestamp":1702752484,"level":"INFO","function":"main","line":3093,"message":"HTTP server listening","port":"8080","hostname":"0.0.0.0"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)

CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: unspecified launch failure
current device: 1
GGML_ASSERT: /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: !"CUDA error"
Aborted (core dumped)

Environment and Context

$ lscpu
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7
10:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
16:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
16:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
16:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
20:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
21:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
2b:00.0 Non-Volatile memory controller: Intel Corporation Device f1aa (rev 03)
30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c9)
30:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
30:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
30:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller

Operating System, e.g. for Linux:

$ uname -a
Linux mojoserver 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

phalexo · 2023-12-16T19:27:02Z

Rebuild from source with LLAMA_CUDA_FORCE_MMQ=on will probably fix it.

alienatorZ · 2023-12-17T02:17:42Z

I removed my build directory and ran the below commands but I am still getting the same error. Did I do something wrong?

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on
cmake --build . --config Release

phalexo · 2023-12-17T03:12:50Z

Looks right to me. Make sure when you run and it prints out this flag that it is correct.

…

On Sat, Dec 16, 2023, 9:17 PM Adam ***@***.***> wrote: I removed my build directory and ran the below commands but I am still getting the same error. Did I do something wrong? mkdir build cd build cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on cmake --build . --config Release — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDD3ZOVDBVXXZNOKDPKIQTYJZI5DAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGAYTMMZWGE> . You are receiving this because you commented.Message ID: <ggerganov/llama. ***@***.***>

phalexo · 2023-12-17T03:15:10Z

I assume that the flag was on the same line as cmake, and it just got folded by formatting. On Sat, Dec 16, 2023, 10:12 PM pensive introvert < ***@***.***> wrote:

…

Looks right to me. Make sure when you run and it prints out this flag that it is correct. On Sat, Dec 16, 2023, 9:17 PM Adam ***@***.***> wrote: > I removed my build directory and ran the below commands but I am still > getting the same error. Did I do something wrong? > > mkdir build > cd build > cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on > cmake --build . --config Release > > — > Reply to this email directly, view it on GitHub > <#4502 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZOVDBVXXZNOKDPKIQTYJZI5DAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGAYTMMZWGE> > . > You are receiving this because you commented.Message ID: <ggerganov/llama > .***@***.***> >

Dirky14 · 2023-12-17T13:06:19Z

Can reproduce the issue. I have the same error, but not on the same line of the cuda file.
CUDA error 719 at /home/xxx/llama.cpp/ggml-cuda.cu:7603: unspecified launch failure
current device: 0
GGML_ASSERT: /home/xxx/llama.cpp/ggml-cuda.cu:7603: !"CUDA error"

I have a config of 2*P40 and I built with the instructions of @phalexo. The flag is on on the cmd, and the error only occurs when I add some context to the prompt (if I run the file with 1 or 2 sentences, it works well. This error only occurs on mixtral model, not on the llama2 models.

alienatorZ · 2023-12-17T14:44:22Z

Yes I also have 2p40s
@phalexo yes it was on the same line. It must have folded when I posted.

phalexo · 2023-12-17T15:24:15Z

P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell). We don't have tensor cores. When you launch "main" make certain the displayed flags indicate that tensor cores are not being used. make puts "main" in llama.cpp folder and cmake in build/bin. Just check which main you are running. My total VRAM over 4 GPUs is about 49GiB Are your P40s 24GiB each? You should be able to run it. Try this command line, maybe there is something with flags. bin/main -ngl 33 -m /opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p "<s>[INST] You have an IQ of 200 and love puzzles. [/INST] "

…

On Sun, Dec 17, 2023 at 9:44 AM Adam ***@***.***> wrote: Yes I also have 2p40s @phalexo <https://github.com/phalexo> yes it was on the same line. It must have folded when I posted. — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

alienatorZ · 2023-12-17T17:07:57Z

Tesla P40 is on the Pascal architecture. Mine do have 24GB each. When I run through the server web interface on 8000 I can talk to the model. When I run using API with autogen or other I get the error.

alienatorZ · 2023-12-17T17:16:52Z

When I send long context into the web interface I get the error also. It does seem to be the context size.

CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:7970: unspecified launch failure
current device: 0
GGML_ASSERT: /home/adam/Downloads/llama.cpp/ggml-cuda.cu:7970: !"CUDA error"
Aborted (core dumped)

phalexo · 2023-12-17T17:16:54Z

Ok, Volta I think is the first architecture with tensor cores.

…

On Sun, Dec 17, 2023, 12:08 PM Adam ***@***.***> wrote: Tesla P40 is on the Pascal architecture. Mine do have 24GB each. — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDD3ZI3WM3IFVUIA45DKP3YJ4RHPAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGIZDMMBWGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

alienatorZ · 2023-12-17T17:27:05Z

P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell). We don't have tensor cores. When you launch "main" make certain the displayed flags indicate that tensor cores are not being used. make puts "main" in llama.cpp folder and cmake in build/bin. Just check which main you are running. My total VRAM over 4 GPUs is about 49GiB Are your P40s 24GiB each? You should be able to run it. Try this command line, maybe there is something with flags. bin/main -ngl 33 -m /opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p "[INST] You have an IQ of 200 and love puzzles. [/INST] "
…
On Sun, Dec 17, 2023 at 9:44 AM Adam @.> wrote: Yes I also have 2p40s @phalexo https://github.com/phalexo yes it was on the same line. It must have folded when I posted. — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU . You are receiving this because you were mentioned.Message ID: @.>

~~Just to clarify I am talking about Mixtral 7x8 MOE~~

phalexo · 2023-12-17T17:29:24Z

Yes, I got it. But the error was affecting small models too, it was not limited to Mixtral.

…

On Sun, Dec 17, 2023, 12:27 PM Adam ***@***.***> wrote: P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell). We don't have tensor cores. When you launch "main" make certain the displayed flags indicate that tensor cores are not being used. make puts "main" in llama.cpp folder and cmake in build/bin. Just check which main you are running. My total VRAM over 4 GPUs is about 49GiB Are your P40s 24GiB each? You should be able to run it. Try this command line, maybe there is something with flags. bin/main -ngl 33 -m /opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p "[INST] You have an IQ of 200 and love puzzles. [/INST] " … <#m_5295942860062125527_> On Sun, Dec 17, 2023 at 9:44 AM Adam *@*.*> wrote: Yes I also have 2p40s @phalexo <https://github.com/phalexo> https://github.com/phalexo <https://github.com/phalexo> yes it was on the same line. It must have folded when I posted. — Reply to this email directly, view it on GitHub <#4502 (comment) <#4502 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU <https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU> . You are receiving this because you were mentioned.Message ID: @.*> Just to clarify I am talking about Mixtral 7x8 MOE — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDD3ZNZYAHSYNKGZITZNXTYJ4TPJAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGIZTANJUGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

julianullrich99 · 2023-12-19T13:21:10Z

got the same error, seems to be random though.
I could only reproduce it with mixtral, however.

kkaarrss · 2023-12-21T08:47:50Z

I am getting the same. Shorter prompts do fine, going from a Q6 to Q5 also increases the amount of tokens I can input before getting the same error. Also 2x P40

zwilch · 2023-12-21T13:02:25Z

2xP40 24GB same similary error

CUDA error 719 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8019: unspecified launch failure
current device:0

something with context size on oobabooga chat tab I can chat with it and the context incresing, there is no problem to go over 4000, when i use the openai API and send a shorte chat/completion about 3 roles it crashes immediately

cryolite-ai · 2024-01-07T10:45:07Z

@alienatorZ Can you edit the original post please - the line which has the uname output is generating a link to an as yet unclosed issue # 101 (from March last year!) even though it's completely unrelated.. it's just how the # and number got parsed at time of posting - ta.

github-actions · 2024-03-18T01:36:00Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-04-02T01:10:48Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

alienatorZ added the bug-unconfirmed label Dec 16, 2023

zwilch mentioned this issue Dec 21, 2023

issue running mixtral Mixtral-8x7B-v0.1-GGUF oobabooga/text-generation-webui#4987

Closed

1 task

github-actions bot added the stale label Mar 18, 2024

github-actions bot closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue running mixtrall #4502

issue running mixtrall #4502

alienatorZ commented Dec 16, 2023

phalexo commented Dec 16, 2023

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

phalexo commented Dec 17, 2023 via email

Dirky14 commented Dec 17, 2023 •

edited

Loading

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

alienatorZ commented Dec 17, 2023 •

edited

Loading

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

julianullrich99 commented Dec 19, 2023

kkaarrss commented Dec 21, 2023 •

edited

Loading

zwilch commented Dec 21, 2023

cryolite-ai commented Jan 7, 2024

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024

issue running mixtrall #4502

issue running mixtrall #4502

Comments

alienatorZ commented Dec 16, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

phalexo commented Dec 16, 2023

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

phalexo commented Dec 17, 2023 via email

Dirky14 commented Dec 17, 2023 • edited Loading

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

alienatorZ commented Dec 17, 2023 • edited Loading

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

alienatorZ commented Dec 17, 2023

phalexo commented Dec 17, 2023 via email

julianullrich99 commented Dec 19, 2023

kkaarrss commented Dec 21, 2023 • edited Loading

zwilch commented Dec 21, 2023

cryolite-ai commented Jan 7, 2024

github-actions bot commented Mar 18, 2024

github-actions bot commented Apr 2, 2024

Dirky14 commented Dec 17, 2023 •

edited

Loading

alienatorZ commented Dec 17, 2023 •

edited

Loading

kkaarrss commented Dec 21, 2023 •

edited

Loading