-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] DeepSeek V2 H100 x8 Triton failure #913
Comments
ref #905 hardware: 8 x H100 SXM # main
python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V2 --disable-radix-cache --tp 8 --trust-remote-code
# main
python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V2 --disable-radix-cache --tp 8 --trust-remote-code --disable-flashinfer
# mla
python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V2 --disable-radix-cache --tp 8 --trust-remote-code --enable-mla
# client
python3 -m sglang.bench_serving --backend sglang main branch (with FlashInfer)
main branch (with Triton)
mla branch (with Triton)
|
try triton nightly or later versions |
ok |
cc @ispobock |
Will merge the PR soon. No worries |
Hi @Jokeren Thanks for the fix. How can I use the latest commit? It seems that nightly build is not latest. |
Just build from source https://github.com/triton-lang/triton?tab=readme-ov-file#install-from-source |
Checklist
Describe the bug
ref triton-lang/triton#4418
Reproduction
Using FlashInfer is ok, and I just want to test with Triton.
Environment
The text was updated successfully, but these errors were encountered: