[RFC]: spec decode with draft models

### Motivation.

Enabling draft models for speculative decoding (SD) in vllm achieved good performance improvement，is not yet implemented in vllm-ascend.This type of SD requires no special trained heads (like EAGLE, or Medusa). I want to implement this feature in vllm-ascend.

For detailed information,  please refer to https://github.com/vllm-project/vllm/pull/24322 

### Proposed Change.

A separate script named draft_model.py will be built in the spec_decode directory. And will add draft model related logic will be added to scheduler.py 、model_runner_v1.py、 unit test，etc.

### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: spec decode with draft models #3585

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: spec decode with draft models #3585

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions