Skip to content

Conversation

@PerryZhang01
Copy link

@PerryZhang01 PerryZhang01 commented Oct 28, 2025

Purpose

Support EPLB feature on rocm platform.

Test Plan

export VLLM_USE_V1=1
export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=1
export VLLM_USE_TRITON_FLASH_ATTN=0
export NCCL_DEBUG=WARN
export VLLM_RPC_TIMEOUT=1800000
export VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_USE_TRITON_ROPE=1

export VLLM_ROCM_USE_AITER_FAKE_BALANCED_EXPERTS=1
export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0

model_path="/data/pretrained-models/deepseek-ai/DeepSeek-V3"

# export AITER_REBUILD=1
# rm ~/.cache/vllm/torch_compile_cache/ -r
vllm serve $model_path \
--tensor-parallel-size 8 \
--max-num-batched-tokens 32768 \
--trust-remote-code \
--no-enable-prefix-caching \
--disable-log-requests \
--compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
--gpu_memory_utilization 0.9 \
--block-size 1 \
--enable-expert-parallel \
--enable-eplb \
--num-redundant-experts 8 \
--eplb-log-balancedness \
--eplb-window-size 3000 \
--eplb-step-interval 1000 \
model_path="/data/pretrained-models/deepseek-ai/DeepSeek-V3"

# python -m vllm.benchmarks.serve
# vllm bench serve \
python -m vllm.entrypoints.cli.main bench serve \
    --host localhost \
    --port 8000 \
    --model ${model_path} \
    --dataset-name random \
    --random-input-len 3584 \
    --random-output-len 1024 \
    --max-concurrency 64 \
    --num-prompts 64 \
    --seed 123 \
    --percentile-metrics ttft,tpot,itl,e2el \
    --ignore-eos \

Test Result

image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants