-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2) #6050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sirejdua, very clean changes!
@sirejdua you need to merge in the latest main and resolve the conflicts |
This PR adds support for a draft worker with TP==1 and a target worker with TP>1. Support for draft worker>1 will come in a 2nd PR. This PR makes use of vllm-project#5414 to wrap the `MLPSpeculatorWorker` with a `SmallerTPProposerWorker`. Adds a test case for `ibm-granite/granite-3b-code-instruct{-accelerator}` to `test_draft_model_tp_lt_target_model_tp2`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
The test failure here looks like a config issue. After fixing it locally, I do see an issue regarding token generation. Any idea what could be causing this? I will continue debugging in the morning
|
Head branch was pushed to by a user without write access
@sirejdua looks like the tests area all passing now. The prior failure looks precision-related, if it happens again we can consider having affected tests run with fp32. |
…llm-project#6050) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
…llm-project#6050) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
…llm-project#6050) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
…llm-project#6050) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
…llm-project#6050) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua> Signed-off-by: Alvant <alvasian@yandex.ru>
This PR adds support for a draft worker with
TP==1
and a target worker withTP>1
. Support for draft workerTP>1
will come in a 2nd PR.This PR makes use of #5414 to wrap the
MLPSpeculatorWorker
with aSmallerTPProposerWorker
.Adds a test case for
ibm-granite/granite-3b-code-instruct{-accelerator}
totest_draft_model_tp_lt_target_model_tp2
.FIX #5809