[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI #1109

Yikun · 2025-06-07T00:36:00Z

What this PR does / why we need it?

Set default values to fix spec decode
To avoid oom, we need to run the test in a single process

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed, espcecially multicards CI
For spec decode test, long term CI passed

Closes: #1105

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

Co-authored-by: mengwei805 <mengwei25@huawei.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun · 2025-06-07T03:19:54Z

cc @yiz-liu @mengwei805 @wangxiyuan

Yikun · 2025-06-07T03:21:10Z

@wangxiyuan @mengwei805 Thanks all, merge to main to recover CI.

…ard CI (vllm-project#1109) ### What this PR does / why we need it? - Set default values to fix spec decode - To avoid oom, we need to run the test in a single process ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed, espcecially multicards CI - For spec decode test, long term CI passed Closes: vllm-project#1105 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>

@XWFAlone

### What this PR does / why we need it? 1. [PR913](#913) introduced an error that caused V0's spec decode function to fail. [PR1109](#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove its patchs in this pr. 6. v1 mtp ut failed(https://github.com/vllm-project/vllm-ascend/actions/runs/15782006176/job/44489813330?pr=1323), I commented it out. @XWFAlone @JC-ut0 ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? ngram and eagle were tested locally using an 800I A2 machine, using real weights instead of the random small weights used by UT, and using a scenario test with tp>1. and other were tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

### What this PR does / why we need it? 1. [PR913](#913) introduced an error that caused V0's spec decode function to fail. [PR1109](#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…ard CI (vllm-project#1109) ### What this PR does / why we need it? - Set default values to fix spec decode - To avoid oom, we need to run the test in a single process ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed, espcecially multicards CI - For spec decode test, long term CI passed Closes: vllm-project#1105 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>

### What this PR does / why we need it? 1. [PR913](vllm-project#913) introduced an error that caused V0's spec decode function to fail. [PR1109](vllm-project#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…ard CI (vllm-project#1109) ### What this PR does / why we need it? - Set default values to fix spec decode - To avoid oom, we need to run the test in a single process ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed, espcecially multicards CI - For spec decode test, long term CI passed Closes: vllm-project#1105 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>

### What this PR does / why we need it? 1. [PR913](vllm-project#913) introduced an error that caused V0's spec decode function to fail. [PR1109](vllm-project#1109) wanted to fix this problem. Unfortunately, the fix broke the ngram function. I fixed the ngram function in this PR. **PS**: Q: Why is there a problem when ngram is not found when pr1109 is merged? A: The newly introduced problem will only appear when tp>1, and the use cases on CI are all tp=1 2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to avoid CI taking too long, including eagle speculative UTs, which made CI unable to take care of the eagle function. I added it(`test_eagle_correctness.py`) back in this PR 3. Because of the reason mentioned in 2, the current version of Eagle has a problem. I located and fixed this problem. It was because vllm's `draft_model_runner.py` was changed and vllm-ascend was not synchronized in time. 4. Currently, the UTs of v0 and v1 are mixed in the spec_decode directory. I split them into two directories: spec_decode_v0 and spec_decode_v1. 5. i found `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor` and `vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace` have changed in vllm, so i remove it in this pr. ### Does this PR introduce _any_ user-facing change? This PR fixes the functions of ngram and eagle spec decode in the v0 engine ### How was this patch tested? tested by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

yiz-liu and others added 3 commits June 7, 2025 07:49

[Fix] Set default values to fix spec decode

b5d470b

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

Address comments for mengwei

929e740

Co-authored-by: mengwei805 <mengwei25@huawei.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Set default values to fix spec decode and fix multicard CI

07370c3

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Yikun mentioned this pull request Jun 7, 2025

[Fix] Set default values to fix spec decode #1105

Closed

Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 7, 2025

Yikun marked this pull request as ready for review June 7, 2025 03:19

wangxiyuan approved these changes Jun 7, 2025

View reviewed changes

mengwei805 approved these changes Jun 7, 2025

View reviewed changes

Yikun changed the title ~~Set default values to fix spec decode and fix multicard CI~~ [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI Jun 7, 2025

Yikun merged commit 8d00775 into vllm-project:main Jun 7, 2025
34 checks passed

This was referenced Jun 20, 2025

[CI/UT][bugfix] fix v0 spec decode #1321

Merged

[v0.9.1-dev][CI/UT][bugfix]fix v0 spec decode #1323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI #1109

[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI #1109

Uh oh!

Yikun commented Jun 7, 2025 •

edited

Loading

Uh oh!

Yikun commented Jun 7, 2025

Uh oh!

Yikun commented Jun 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI #1109

[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI #1109

Uh oh!

Conversation

Yikun commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Yikun commented Jun 7, 2025

Uh oh!

Yikun commented Jun 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Yikun commented Jun 7, 2025 •

edited

Loading