Skip to content

[Guide][Performance]: vLLM Ascend v0.7.3.post1 benchmark for Qwen3 #1025

@zhanglzu

Description

@zhanglzu

Benchmark environment

Download the mindie_turbo tar file from ascend website and place it in a new directory with a new Dockerfile

Dockerfile

FROM quay.io/ascend/vllm-ascend:v0.7.3.post1
COPY ./Ascend-mindie-turbo_2.0.RC1_py310_linux_aarch64.tar.gz /tmp
RUN cd /tmp && \
    tar -xzvf /tmp/Ascend-mindie-turbo_2.0.RC1_py310_linux_aarch64.tar.gz  && \ 
    cd /tmp/Ascend-mindie-turbo_2.0.RC1_py310_linux_aarch64 && \
    pip install --no-deps *.whl  && \
    pip cache purge

or

FROM quay.io/ascend/vllm-ascend:v0.7.3.post1
RUN pip install mindie-turbo==2.0rc1 && pip cache purge

and then build a new image and then run the new docker image and perform testing.

Test step

Case 1 Qwen3-32B TP4

vllm serve Qwen3-32B --gpu_memory_utilization=0.92 --port 32561 --rope-scaling '{"rope_type":"yarn","factor":4,"original_max_position_embeddings":32768}' --max-model-len 131072 -tp 4

Case 2 DeepSeek-R1-0528-Qwen3-8B TP1

vllm serve DeepSeek-R1-0528-Qwen3-8B --gpu_memory_utilization=0.92 --port 32563 --rope-scaling '{"rope_type":"yarn","factor":2,"original_max_position_embeddings":32768}' --max-model-len 65536

Results

For qwen3-32b with 4 NPUs, inference speed increased from 8 tokens/s to 18 tokens/s.
For DeepSeek-R1-0528-Qwen3-8B with 1 NPU, inference speed increased from 20 tokens/s to 34 tokens/s.

But, for DeepSeek-R1-0528-Qwen3-8B, I am not sure whether the model support rope scaling, beacuse when I pulled the service, I received some messages, even though the model is running normally:

Unrecognized keys inrope_scalingfor 'rope_type'='yarn': {'attn_factor'}

Model TP baseline v0.7.3.post1 + mindie turbo
Qwen3-32b 4 8 tokens/s 18 tokens/s
DeepSeek-R1-0528-Qwen3-8B 1 20 token/s 34 tokens/s
  • baseline: v0.8.5rc1 without any optimized

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions