Skip to content

Conversation

@zhoux77899
Copy link
Contributor

@zhoux77899 zhoux77899 commented Aug 5, 2025

What this PR does / why we need it?

Fixes unable to load qwen3_moe quantized weights issue due to #1994

Does this PR introduce any user-facing change?

None

How was this patch tested?

Add a qwen3_moe W8A8 quantized model in tests/e2e/multicard/test_qwen3_moe.py

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
@github-actions
Copy link

github-actions bot commented Aug 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

max_model_len=8192,
dtype=dtype,
tensor_parallel_size=2,
quantization="ascend",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see: #2223 let's enable aclgraph for test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
@codecov
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 10.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.65%. Comparing base (126cdfc) to head (303b1df).
⚠️ Report is 614 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/models/qwen3_moe.py 10.00% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2219      +/-   ##
==========================================
- Coverage   76.75%   76.65%   -0.11%     
==========================================
  Files         113      113              
  Lines       12743    12763      +20     
==========================================
+ Hits         9781     9783       +2     
- Misses       2962     2980      +18     
Flag Coverage Δ
unittests 76.65% <10.00%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
@wangxiyuan wangxiyuan merged commit e31b31f into vllm-project:main Aug 6, 2025
25 checks passed
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants