Skip to content

Conversation

@ttanzhiqiang
Copy link
Contributor

What this PR does / why we need it?

W_UV/W_UK_T cannot be converted to nz, because this position will be fused into transposebatchmatmul, which does not support nz. The weights are actually converted back to nd in each run.

Does this PR introduce any user-facing change?

Use #1098 as the baseline
image
p90 TPOT 90.79ms->88.58ms, Improve TPOP 2ms

How was this patch tested?

use #1101

ttanzhiqiang and others added 3 commits May 21, 2025 23:23
@ApsarasX ApsarasX added the ready read for review label Jun 9, 2025
@ttanzhiqiang
Copy link
Contributor Author

@wangxiyuan @Yikun

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Signed-off-by: ttanzhiqiang <389825161@qq.com>
@ttanzhiqiang ttanzhiqiang force-pushed the use_attention_linear_nz branch from 2d4b564 to e795fab Compare June 12, 2025 04:03
Signed-off-by: ttanzhiqiang <389825161@qq.com>
@jianzs jianzs merged commit 4270682 into vllm-project:main Jun 15, 2025
18 checks passed
@Yikun Yikun added this to the v0.9.1 milestone Jun 23, 2025
shiyuan680 pushed a commit to raindaywhu/vllm-ascend that referenced this pull request Jul 7, 2025
…t#1131)

W_UV/W_UK_T cannot be converted to nz, because this position will be
fused into transposebatchmatmul, which does not support nz. The weights
are actually converted back to nd in each run.

Use vllm-project#1098 as the baseline, p90 TPOT 90.79ms->88.58ms, improve TPOP 2ms

use vllm-project#1101

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…t#1131)

### What this PR does / why we need it?
W_UV/W_UK_T cannot be converted to nz, because this position will be
fused into transposebatchmatmul, which does not support nz. The weights
are actually converted back to nd in each run.

### Does this PR introduce _any_ user-facing change?
Use vllm-project#1098 as the baseline, p90 TPOT 90.79ms->88.58ms, improve TPOP 2ms

### How was this patch tested?
use vllm-project#1101

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…t#1131)

### What this PR does / why we need it?
W_UV/W_UK_T cannot be converted to nz, because this position will be
fused into transposebatchmatmul, which does not support nz. The weights
are actually converted back to nd in each run.

### Does this PR introduce _any_ user-facing change?
Use vllm-project#1098 as the baseline, p90 TPOT 90.79ms->88.58ms, improve TPOP 2ms

### How was this patch tested?
use vllm-project#1101

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants