Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue running mixtrall #4502

Closed
alienatorZ opened this issue Dec 16, 2023 · 18 comments
Closed

issue running mixtrall #4502

alienatorZ opened this issue Dec 16, 2023 · 18 comments

Comments

@alienatorZ
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x ] I carefully followed the README.md.
  • [x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The server should serve the request and provide a response

Current Behavior

....
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 117.72 MiB
llama_new_context_with_model: VRAM scratch buffer: 114.54 MiB
llama_new_context_with_model: total VRAM used: 25324.09 MiB (model: 25145.55 MiB, context: 178.54 MiB)
Available slots:
-> Slot 0 - max context: 512

llama server listening at http://0.0.0.0:8080

{"timestamp":1702752484,"level":"INFO","function":"main","line":3093,"message":"HTTP server listening","port":"8080","hostname":"0.0.0.0"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)

CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: unspecified launch failure
current device: 1
GGML_ASSERT: /home/adam/Downloads/llama.cpp/ggml-cuda.cu:8008: !"CUDA error"
Aborted (core dumped)

Environment and Context

$ lscpu
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7
10:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
16:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
16:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
16:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
20:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
21:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
2b:00.0 Non-Volatile memory controller: Intel Corporation Device f1aa (rev 03)
30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c9)
30:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
30:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
30:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
30:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller

  • Operating System, e.g. for Linux:

$ uname -a
Linux mojoserver 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@phalexo
Copy link

phalexo commented Dec 16, 2023

Rebuild from source with LLAMA_CUDA_FORCE_MMQ=on will probably fix it.

@alienatorZ
Copy link
Author

I removed my build directory and ran the below commands but I am still getting the same error. Did I do something wrong?

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_FORCE_MMQ=on
cmake --build . --config Release

@phalexo
Copy link

phalexo commented Dec 17, 2023 via email

@phalexo
Copy link

phalexo commented Dec 17, 2023 via email

@Dirky14
Copy link

Dirky14 commented Dec 17, 2023

Can reproduce the issue. I have the same error, but not on the same line of the cuda file.
CUDA error 719 at /home/xxx/llama.cpp/ggml-cuda.cu:7603: unspecified launch failure
current device: 0
GGML_ASSERT: /home/xxx/llama.cpp/ggml-cuda.cu:7603: !"CUDA error"

I have a config of 2*P40 and I built with the instructions of @phalexo. The flag is on on the cmd, and the error only occurs when I add some context to the prompt (if I run the file with 1 or 2 sentences, it works well. This error only occurs on mixtral model, not on the llama2 models.

@alienatorZ
Copy link
Author

Yes I also have 2p40s
@phalexo yes it was on the same line. It must have folded when I posted.

@phalexo
Copy link

phalexo commented Dec 17, 2023 via email

@alienatorZ
Copy link
Author

alienatorZ commented Dec 17, 2023

Tesla P40 is on the Pascal architecture. Mine do have 24GB each. When I run through the server web interface on 8000 I can talk to the model. When I run using API with autogen or other I get the error.

@alienatorZ
Copy link
Author

When I send long context into the web interface I get the error also. It does seem to be the context size.

CUDA error 719 at /home/adam/Downloads/llama.cpp/ggml-cuda.cu:7970: unspecified launch failure
current device: 0
GGML_ASSERT: /home/adam/Downloads/llama.cpp/ggml-cuda.cu:7970: !"CUDA error"
Aborted (core dumped)

@phalexo
Copy link

phalexo commented Dec 17, 2023 via email

@alienatorZ
Copy link
Author

P40 is a Maxwell architecture, right? I am running Titan X (also Maxwell). We don't have tensor cores. When you launch "main" make certain the displayed flags indicate that tensor cores are not being used. make puts "main" in llama.cpp folder and cmake in build/bin. Just check which main you are running. My total VRAM over 4 GPUs is about 49GiB Are your P40s 24GiB each? You should be able to run it. Try this command line, maybe there is something with flags. bin/main -ngl 33 -m /opt/data/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive -p "[INST] You have an IQ of 200 and love puzzles. [/INST] "

On Sun, Dec 17, 2023 at 9:44 AM Adam @.> wrote: Yes I also have 2p40s @phalexo https://github.com/phalexo yes it was on the same line. It must have folded when I posted. — Reply to this email directly, view it on GitHub <#4502 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDD3ZORBTOAZVCJMCOQ6WTYJ4ANFAVCNFSM6AAAAABAXYSYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGE4TCOBWGU . You are receiving this because you were mentioned.Message ID: @.>

Just to clarify I am talking about Mixtral 7x8 MOE

@phalexo
Copy link

phalexo commented Dec 17, 2023 via email

@julianullrich99
Copy link

got the same error, seems to be random though.
I could only reproduce it with mixtral, however.

@kkaarrss
Copy link

kkaarrss commented Dec 21, 2023

I am getting the same. Shorter prompts do fine, going from a Q6 to Q5 also increases the amount of tokens I can input before getting the same error. Also 2x P40

@zwilch
Copy link

zwilch commented Dec 21, 2023

2xP40 24GB same similary error

CUDA error 719 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8019: unspecified launch failure
current device:0

something with context size on oobabooga chat tab I can chat with it and the context incresing, there is no problem to go over 4000, when i use the openai API and send a shorte chat/completion about 3 roles it crashes immediately

@cryolite-ai
Copy link

@alienatorZ Can you edit the original post please - the line which has the uname output is generating a link to an as yet unclosed issue # 101 (from March last year!) even though it's completely unrelated.. it's just how the # and number got parsed at time of posting - ta.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants