-
Notifications
You must be signed in to change notification settings - Fork 543
Open
Labels
RFCRequest For CommentsRequest For Comments
Description
Motivation.
Enabling draft models for speculative decoding (SD) in vllm achieved good performance improvement,is not yet implemented in vllm-ascend.This type of SD requires no special trained heads (like EAGLE, or Medusa). I want to implement this feature in vllm-ascend.
For detailed information, please refer to vllm-project/vllm#24322
Proposed Change.
A separate script named draft_model.py will be built in the spec_decode directory. And will add draft model related logic will be added to scheduler.py 、model_runner_v1.py、 unit test,etc.
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments