Skip to content

Conversation

@Yikun
Copy link
Collaborator

@Yikun Yikun commented Jun 7, 2025

What this PR does / why we need it?

  • Set default values to fix spec decode
  • To avoid oom, we need to run the test in a single process

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • CI passed, espcecially multicards CI
  • For spec decode test, long term CI passed

Closes: #1105

yiz-liu and others added 3 commits June 7, 2025 07:49
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
@Yikun Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 7, 2025
@Yikun Yikun marked this pull request as ready for review June 7, 2025 03:19
@Yikun
Copy link
Collaborator Author

Yikun commented Jun 7, 2025

cc @yiz-liu @mengwei805 @wangxiyuan

@Yikun
Copy link
Collaborator Author

Yikun commented Jun 7, 2025

@wangxiyuan @mengwei805 Thanks all, merge to main to recover CI.

@Yikun Yikun changed the title Set default values to fix spec decode and fix multicard CI [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI Jun 7, 2025
@Yikun Yikun merged commit 8d00775 into vllm-project:main Jun 7, 2025
34 checks passed
Yuxiao-Xu pushed a commit to Yuxiao-Xu/vllm-ascend that referenced this pull request Jun 7, 2025
…ard CI (vllm-project#1109)

### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: vllm-project#1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Yuxiao-Xu pushed a commit to Yuxiao-Xu/vllm-ascend that referenced this pull request Jun 7, 2025
…ard CI (vllm-project#1109)

### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: vllm-project#1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
ganyi1996ppo pushed a commit that referenced this pull request Jun 21, 2025
### What this PR does / why we need it?
1. [PR913](#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove its patchs in this pr.
6. v1 mtp ut
failed(https://github.com/vllm-project/vllm-ascend/actions/runs/15782006176/job/44489813330?pr=1323),
I commented it out. @XWFAlone @JC-ut0 

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
ngram and eagle were tested locally using an 800I A2 machine, using real
weights instead of the random small weights used by UT, and using a
scenario test with tp>1.
and other were tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
wangxiyuan pushed a commit that referenced this pull request Jun 23, 2025
### What this PR does / why we need it?
1. [PR913](#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…ard CI (vllm-project#1109)

### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: vllm-project#1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
1. [PR913](vllm-project#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](vllm-project#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ard CI (vllm-project#1109)

### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: vllm-project#1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
1. [PR913](vllm-project#913)
introduced an error that caused V0's spec decode function to fail.
[PR1109](vllm-project#1109) wanted
to fix this problem. Unfortunately, the fix broke the ngram function. I
fixed the ngram function in this PR. **PS**: Q: Why is there a problem
when ngram is not found when pr1109 is merged? A: The newly introduced
problem will only appear when tp>1, and the use cases on CI are all tp=1
2. In versions after 0.7.3, vllm-ascend deleted some spec decode UTs to
avoid CI taking too long, including eagle speculative UTs, which made CI
unable to take care of the eagle function. I added
it(`test_eagle_correctness.py`) back in this PR
3. Because of the reason mentioned in 2, the current version of Eagle
has a problem. I located and fixed this problem. It was because vllm's
`draft_model_runner.py` was changed and vllm-ascend was not synchronized
in time.
4. Currently, the UTs of v0 and v1 are mixed in the spec_decode
directory. I split them into two directories: spec_decode_v0 and
spec_decode_v1.
5. i found
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensor`
and
`vllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplace`
have changed in vllm, so i remove it in this pr.

### Does this PR introduce _any_ user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0
engine

### How was this patch tested?
tested by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

long-term-test enable long term test for PR ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants