Skip to content

Conversation

@zhangxinyuehfad
Copy link
Contributor

@zhangxinyuehfad zhangxinyuehfad commented Aug 27, 2025

What this PR does / why we need it?

Add e2e ci test for A3

Does this PR introduce any user-facing change?

How was this patch tested?

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@zhangxinyuehfad zhangxinyuehfad force-pushed the zxy_a3_yaml branch 2 times, most recently from e732f9e to c6018e2 Compare August 27, 2025 09:46
image: m.daocloud.io/quay.io/ascend/cann:8.2.rc1-a3-ubuntu22.04-py3.11
env:
DEBIAN_FRONTEND: noninteractive
COMPILE_CUSTOM_KERNELS: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defautly set to 1 already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted it

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- linux-aarch64-310p-4
- ubuntu-24.04-arm
- linux-aarch64-a3-1
- linux-aarch64-a3-2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two runner could also add to the e2e test, but we can do it in next pr, WDYT? @wangxiyuan @Yikun

@MengqingCao
Copy link
Collaborator

Let's merge this first to unblock CI for more ST cases

@MengqingCao MengqingCao merged commit e7ad4a6 into vllm-project:main Aug 29, 2025
14 checks passed
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Aug 29, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Aug 29, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
anon189Ty added a commit to anon189Ty/vllm-ascend that referenced this pull request Aug 29, 2025
Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

add cann version judgment

update ut

correct spelling errors

Update ut

Support v0.10.1 (vllm-project#2584)

This patch also supports v0.10.1

No

- CI passed
- test 0.10.1: vllm-project#2583
- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@321938e

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

[Fix] Fix DP-related padding logic (vllm-project#2582)

The determination of attention state, padding, and other forward
metadata has been moved to an earlier stage within the input preparation
process. This change enables us to utilize a single all-reduce
operation, maximizing synchronization efficiency as early as possible.

The logic for synchronizing metadata—such as the number of tokens,
prefill status, and DBO status—across data parallel (DP) ranks has now
been unified and simplified.

For performance improvements, the all-reduce operation has been switched
from the `gloo` backend to the `npu` backend, which results in an
reduction of several milliseconds per step (**approximately 10%
performance gain for TPOT!**).

Additionally, the multi-DP server hang issue has been resolved, ensuring
no more hangs occur when `num_requests < dp_size`. Alas, a relief.

Finally, the miscalculated memory usage issue has been addressed by
removing the unnecessary `DummyCommImpl`, allowing the system to use the
real communication method when determining available memory.

None.

Maybe we should add an test case for multi-DP online server?
@MengqingCao

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@c5d004a

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

[CI] Add e2e ci test for A3 (vllm-project#2573)

Add e2e ci test for A3

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

[Feat]: Add custom lmhead tensor model parallel (vllm-project#2309)

This PR introduces LMhead tensor model parallel to achieve decreasing of
memory consumption, and TPOT performance improvement. It support both
eager mode and graph mode.

In deepseek r1 w8a8 PD disagregated Decode instance, using pure DP, with
lmhead_tensor_parallel_size = 8, we have 1 ms TPOT optimization, saved
1.48 GB NPU memory per RANK.

performance data:
<img width="1444" height="438" alt="image"
src="https://github.com/user-attachments/assets/3c5ef0d3-a7c7-46fd-9797-4de728eb0cb0"
/>

This PR introduces one new config in `additional_config`.
| Name | Effect | Required | Type | Constraints |
| :---------------------------- |
:--------------------------------------- | :------- | :--- |
:----------------- |
| lmhead_tensor_parallel_size | Split the lm_head matrix along the
column dimension (vocab_size) into lmhead_tensor_parallel_size pieces |
No | int | default value is None, once this value is set, the feature
will be enabled, vocab_size must be divisible by this value. |

example

`--additional_config={"lmhead_tensor_parallel_size": 8}`

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@de533ab

---------

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: zhangzihang <zzh_201018@outlook.com>

Fix import bug

Remove whitespace
anon189Ty pushed a commit to anon189Ty/vllm-ascend that referenced this pull request Aug 29, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
wenba0 pushed a commit to wenba0/vllm-ascend that referenced this pull request Sep 5, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
Add e2e ci test for A3

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@11a7faf

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants