-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Spec Decode] Fix offline spec_decode.py #23461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two fixes to address issues in offline speculative decoding. The first change in examples/offline_inference/spec_decode.py corrects the argument passed to llm.generate for batched prompts, ensuring it receives a list of TokensPrompt objects as expected. The second change in vllm/benchmarks/datasets.py adds a default value for request_id_prefix to prevent AttributeError when calling get_samples. Both changes appear correct and address the underlying bugs. My review did not find any high or critical severity issues with these fixes.
|
Should we run |
|
There is a test file |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@ekagra-ranjan Hi Ekagra, Thanks for the PR! I just came across this PR and verified it did fix the bug for example code. What's the current state of this PR? I think it is a helpful fix. |
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
…m-project#22675) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ot being called when it should when using quantized FP8 model (vllm-project#22281) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…Attention Kernel (vllm-project#22703) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…llm-project#23360) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…llm-project#22527) Signed-off-by: feng <fengli1702@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ist-to-list conversion (vllm-project#20000)" (vllm-project#23396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…oject#23425) Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
This pull request has merge conflicts that must be resolved before it can be |
|
superseeded by #24257 |
Purpose
Offline spec_decode.py was broken by this PR and the recent PR. This PR fixes it.
Test Plan
time VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --print-outputTest Result
Output