[script] run with pure TP8 #736

gbyu-amd · 2025-10-14T02:36:23Z

Change to TP8 since it gives better performance than TP8 + EP8 for now.
Change 3500 to 3.5 * 1024.

TP8 lm_eval result:

wuhuikx · 2025-10-14T03:28:32Z

Please follow this instruction https://docs.vllm.ai/en/latest/contributing/index.html#linting and add the link in the README

python3 -m pip install pre-commit
pre-commit install

pre-commit run --hook-stage manual markdownlint
pre-commit run --hook-stage manual mypy-3.12

tjtanaavllm · 2025-10-14T03:32:29Z

@gbyu-amd
We have just merged [Feat][aiter][ROCm] Add aiter rmsnorm and ptpc fp8 quant fusion #735

This will improve the perf further

--compilation-config '{"pass_config": {"enable_fusion": true, "enable_noop": true, "enable-attn-fusion": false}, "custom_ops": ["+rms_norm", "+quant_fp8"]}'

Could we also enable this in this PR?

There are issues on mi300X when using newer AITER, KF could only test this Qwen3-Coder-PTPC-FP8 model and saw improvements.

local-completions (model=EmbeddedLLM/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic,base_url=http://127.0.0.1:6789/v1/completions), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 100
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8886|±  |0.0087|
|     |       |strict-match    |     5|exact_match|↑  |0.8650|±  |0.0094|

Metric	Without Fused RMS Norm	With Fused RMS Norm	Difference	% Change
Overall Performance
Successful requests	640	640	0	0%
Benchmark duration (s)	470.49	462.86	-7.63	-1.62% ✓
Request throughput (req/s)	1.36	1.38	+0.02	+1.47% ✓
Token Throughput
Output token throughput (tok/s)	1392.93	1415.90	+22.97	+1.65% ✓
Peak output token throughput (tok/s)	2304.00	2368.00	+64.00	+2.78% ✓
Total token throughput (tok/s)	6268.20	6371.54	+103.34	+1.65% ✓
Concurrency
Peak concurrent requests	71.00	75.00	+4.00	+5.63% ✓
Time to First Token (TTFT)
Mean TTFT (ms)	2281.44	2544.96	+263.52	+11.55% ✗
Median TTFT (ms)	2014.27	2116.78	+102.51	+5.09% ✗
P99 TTFT (ms)	11940.64	11891.41	-49.23	-0.41% ✓
Time Per Output Token (TPOT)
Mean TPOT (ms)	43.73	42.73	-1.00	-2.29% ✓
Median TPOT (ms)	44.62	43.40	-1.22	-2.73% ✓
P99 TPOT (ms)	46.01	45.53	-0.48	-1.04% ✓
Inter-token Latency (ITL)
Mean ITL (ms)	43.73	42.73	-1.00	-2.29% ✓
Median ITL (ms)	28.65	28.33	-0.32	-1.12% ✓
P99 ITL (ms)	685.79	672.58	-13.21	-1.93% ✓

Signed-off-by: guanbao <gyu@amd.com>

tjtanaavllm · 2025-10-14T06:03:13Z

LGTM. We will fix the other pre-commit in another PR.
I will wait for your updated command.

Signed-off-by: guanbao <gyu@amd.com>

gbyu-amd · 2025-10-14T08:16:36Z

[Feat][aiter][ROCm] Add aiter rmsnorm and ptpc fp8 quant fusion #735

after adding the compilation config to enable rmsnorm+quant fusion, seems there is acc issue with deepseek ptpc:

Full server cmd:

export VLLM_USE_V1=1
export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=1
export VLLM_USE_TRITON_FLASH_ATTN=0
export NCCL_DEBUG=WARN
export VLLM_RPC_TIMEOUT=1800000
export VLLM_ROCM_USE_AITER_ASMMOE=1
export VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_USE_TRITON_ROPE=1

# original weight  https://huggingface.co/EmbeddedLLM/deepseek-r1-FP8-Dynamic
model_path="/mnt/raid0/guanbao/EmbeddedLLM/deepseek-r1-FP8-Dynamic"

vllm serve $model_path \
  --tensor-parallel-size 8 \
  --max-num-batched-tokens 32768 \
  --trust-remote-code \
  --no-enable-prefix-caching \
  --disable-log-requests \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE", "pass_config": {"enable_fusion": true, "enable_noop": true, "enable-attn-fusion": false}, "custom_ops": ["+rms_norm", "+quant_fp8"]}' \
  --gpu_memory_utilization 0.9 \
  --block-size 1 \

lm_eval cmd:

#!/bin/bash
model="/mnt/raid0/guanbao/EmbeddedLLM/deepseek-r1-FP8-Dynamic"
lm_eval \
--model local-completions \
--tasks gsm8k \
--model_args model=${model},base_url=http://127.0.0.1:8000/v1/completions \
--batch_size 100

tjtanaavllm · 2025-10-14T12:37:43Z

LGTM. The bug will be address in another PR.

gbyu-amd requested review from gshtras, hongxiayang, maleksan85, shajrawi and sunway513 as code owners October 14, 2025 02:36

gbyu-amd requested a review from wuhuikx October 14, 2025 03:01

guanbao added 2 commits October 14, 2025 13:49

change to pure TP8

70f4a2c

make format happy

8ca93a5

Signed-off-by: guanbao <gyu@amd.com>

fix format

bc177bd

Signed-off-by: guanbao <gyu@amd.com>

gbyu-amd force-pushed the guanbao/update_script branch from 403923a to bc177bd Compare October 14, 2025 06:04

gbyu-amd requested review from charlifu, divakar-amd and mawong-amd as code owners October 14, 2025 06:04

tjtanaavllm self-requested a review October 14, 2025 12:37

tjtanaavllm merged commit 6e1c93e into dev/perf Oct 14, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[script] run with pure TP8 #736

[script] run with pure TP8 #736

Uh oh!

gbyu-amd commented Oct 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

wuhuikx commented Oct 14, 2025

Uh oh!

tjtanaavllm commented Oct 14, 2025 •

edited

Loading

Uh oh!

tjtanaavllm commented Oct 14, 2025 •

edited

Loading

Uh oh!

gbyu-amd commented Oct 14, 2025

Uh oh!

tjtanaavllm commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[script] run with pure TP8 #736

[script] run with pure TP8 #736

Uh oh!

Conversation

gbyu-amd commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuhuikx commented Oct 14, 2025

Uh oh!

tjtanaavllm commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaavllm commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbyu-amd commented Oct 14, 2025

Uh oh!

tjtanaavllm commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gbyu-amd commented Oct 14, 2025 •

edited by github-actions bot

Loading

tjtanaavllm commented Oct 14, 2025 •

edited

Loading

tjtanaavllm commented Oct 14, 2025 •

edited

Loading