Skip to content

[RFC]: spec decode with draft models #3585

@HF-001

Description

@HF-001

Motivation.

Enabling draft models for speculative decoding (SD) in vllm achieved good performance improvement,is not yet implemented in vllm-ascend.This type of SD requires no special trained heads (like EAGLE, or Medusa). I want to implement this feature in vllm-ascend.

For detailed information, please refer to vllm-project/vllm#24322

Proposed Change.

A separate script named draft_model.py will be built in the spec_decode directory. And will add draft model related logic will be added to scheduler.py 、model_runner_v1.py、 unit test,etc.

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For Comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions