[BUGFIX] main-sd-bugfix && [UT] add mtp UT #593

mengwei805 · 2025-04-21T07:20:29Z

What this PR does / why we need it?

The pr will fix some bug about spec decode / MTP
The pr add a mtp e2e UT test_mtp_correctness.py

vllm_ascend/attention/attention.py

add support self.attn_mask_cache only has 1 element to cover scene in which both spec docode and chunked prefill are enabled.

vllm_ascend/distributed/parallel_state.py

remove 2 assert because spec decode worker would use init_worker twice

vllm_ascend/models/deepseek_mtp.py

remove unused params;
add support w8a8 in CustomDeepSeekMTP

vllm_ascend/quantization/quant_config.py

use AscendUnquantizedFusedMoEMethod instead of UnquantizedFusedMoEMethod

other

replace from vllm.logger import init_logger to from vllm.logger import logger all of the vllm-ascend project

Does this PR introduce any user-facing change?

How was this patch tested?

MengqingCao · 2025-04-21T07:56:38Z

.github/workflows/vllm_ascend_test.yaml

        run: |
          if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
-            pytest -sv tests/singlecard/spec_decode
+            pytest -sv tests/singlecard/spec_decode/e2e/test_mtp_correctness.py  # it needs a clean process


let's make more comments on why it needs a clean process

i will add more comments

i see a lot of comments it needs a clean process in vLLM test_pipeline, so i add it here, i think vLLM UT arch maybe have some problem?

in this case, it needs a clean process becase it use bf16 and other UTs use fp16, if they are in a common process, it will failed.

Yeah, I think 2 is the main reason in our case.

MengqingCao · 2025-04-21T08:31:42Z

vllm_ascend/distributed/parallel_state.py

                                              expert_tensor_parallel_size)

    global _EP
-    assert _EP is None, ("expert parallel group is already initialized")


why do we delete this assertion here?

As spec decode worker would use init_worker twice, Second init _EP is not None

make sense, thanks!

MengqingCao · 2025-04-21T08:31:48Z

vllm_ascend/distributed/parallel_state.py

    group_ranks = []
    global _ETP
-    assert _ETP is None, (
-        "expert tensor parallel group is already initialized")


As spec decode worker would use init_worker twice, Second init _EP is not None

MengqingCao · 2025-04-21T08:52:23Z

vllm_ascend/quantization/quant_config.py

 from vllm.model_executor.parameter import PerTensorScaleParameter
 from vllm.model_executor.utils import set_weight_attrs

+from ..ops.fused_moe import AscendUnquantizedFusedMoEMethod


I prefer to use absolute path references here

Suggested change

from ..ops.fused_moe import AscendUnquantizedFusedMoEMethod

from vllm_ascend.ops.fused_moe import AscendUnquantizedFusedMoEMethod

ok. i have modified it

Signed-off-by: mengwei805 <mengwei25@huawei.com>

wangxiyuan · 2025-04-21T10:53:02Z

vllm_ascend/quantization/quant_config.py

            if self.is_layer_skipped_ascend(prefix,
                                            self.packed_modules_mapping):
-                return UnquantizedFusedMoEMethod()
+                return AscendUnquantizedFusedMoEMethod()


I wonder why it's not raised error before. @ganyi1996ppo

if weights only has main model, all fusedmoe weight is w8a8,
but current mtp weights are float in deepseek w8a8 weights,
so when enable mtp, it will raise error, and it will run success if not enable mtp

### What this PR does / why we need it? The pr will fix some bug about spec decode / MTP The pr add a mtp e2e UT `test_mtp_correctness.py` **vllm_ascend/attention/attention.py** 1. add support `self.attn_mask_cache` only has 1 element to cover scene in which both spec docode and chunked prefill are enabled. **vllm_ascend/distributed/parallel_state.py** 1. remove 2 assert because spec decode worker would use init_worker twice **vllm_ascend/models/deepseek_mtp.py** 1. remove unused params; 2. add support w8a8 in `CustomDeepSeekMTP` **vllm_ascend/quantization/quant_config.py** 1. use `AscendUnquantizedFusedMoEMethod` instead of `UnquantizedFusedMoEMethod` **other** 1. replace `from vllm.logger import init_logger` to `from vllm.logger import logger` all of the vllm-ascend project ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: mengwei805 <mengwei25@huawei.com>

github-actions bot added the module:quantization label Apr 21, 2025

mengwei805 force-pushed the main-sd-bugfix branch 3 times, most recently from 46b8b8f to f9c61ea Compare April 21, 2025 07:49

github-actions bot added the module:tests label Apr 21, 2025

mengwei805 changed the title ~~[WIP][bugfix] main-sd-bugfix~~ [BUGFIX] main-sd-bugfix && [UT] add mtp UT Apr 21, 2025

mengwei805 force-pushed the main-sd-bugfix branch from f9c61ea to 1256fd9 Compare April 21, 2025 08:30

MengqingCao reviewed Apr 21, 2025

View reviewed changes

mengwei805 force-pushed the main-sd-bugfix branch from 1256fd9 to 3b35c24 Compare April 21, 2025 09:04

[bugfix] main-sd-bugfix

523db6b

Signed-off-by: mengwei805 <mengwei25@huawei.com>

mengwei805 force-pushed the main-sd-bugfix branch from 3b35c24 to 523db6b Compare April 21, 2025 09:06

wangxiyuan approved these changes Apr 21, 2025

View reviewed changes

wangxiyuan merged commit 0ae9ee0 into vllm-project:main Apr 21, 2025
16 checks passed

mengwei805 mentioned this pull request Apr 21, 2025

[v0.7.3]support MTP in deepseek w8a8 quant model #502

Merged

MengqingCao mentioned this pull request Apr 22, 2025

[RFC]: E2E CI test for key features #413

Open

83 tasks

	from ..ops.fused_moe import AscendUnquantizedFusedMoEMethod
	from vllm_ascend.ops.fused_moe import AscendUnquantizedFusedMoEMethod

[BUGFIX] main-sd-bugfix && [UT] add mtp UT #593

[BUGFIX] main-sd-bugfix && [UT] add mtp UT #593

Uh oh!

Conversation

mengwei805 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengwei805 Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengwei805 commented Apr 21, 2025 •

edited

Loading

mengwei805 Apr 21, 2025 •

edited

Loading