[v0.9.1-dev][CI/UT][bugfix]fix v0 spec decode #1323
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it?
test_eagle_correctness.py) back in this PRdraft_model_runner.pywas changed and vllm-ascend was not synchronized in time.vllm.spec_decode.multi_step_worker.MultiStepWorker.set_include_gpu_probs_tensorandvllm.spec_decode.multi_step_worker.MultiStepWorker.set_should_modify_greedy_probs_inplacehave changed in vllm, so i remove its patchs in this pr.I commented it out. @XWFAlone @JC-ut0
Does this PR introduce any user-facing change?
This PR fixes the functions of ngram and eagle spec decode in the v0 engine
How was this patch tested?
ngram and eagle were tested locally using an 800I A2 machine, using real weights instead of the random small weights used by UT, and using a scenario test with tp>1.
and other were tested by CI