Skip to content

Conversation

@rjg-lyh
Copy link
Collaborator

@rjg-lyh rjg-lyh commented Jun 16, 2025

What this PR does / why we need it?

Optimize the performance of Qwen3 model by registering a custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

@ttanzhiqiang
Copy link
Contributor

Can you add e2e testing? I want to try it out.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: rjg-lyh <1318825571@qq.com>


class AddRMSNormQuant(RMSNorm):
"""Root mean square normalization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment

self.post_attention_layernorm = RMSNorm(config.hidden_size,
eps=config.rms_norm_eps)
else:
from vllm_ascend.quantization.quant_config import AscendQuantConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the mainly changes on CustomQwen3DecoderLayer is the AddRMSNormQuant layer. I prefer to inheiret from Qwen3DecoderLayer and add the logic of AddRMSNormQuant. This could make the optimization point clearly and reduce redundant code

import torch_npu

if residual is not None:
x, _, residual = torch_npu.npu_add_rms_norm_quant(x, residual, self.weight,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: what does "add" mean here?

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Yikun
Copy link
Collaborator

Yikun commented Jul 6, 2025

Yes, we should avoid to add big paste code in vllm-ascend. Please also paste perf results here.

@Yikun Yikun mentioned this pull request Jul 8, 2025
45 tasks
@rjg-lyh rjg-lyh closed this Jul 22, 2025
@rjg-lyh rjg-lyh deleted the pr-perf-optim branch July 22, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants