[Guide]: Usage on Speculative Decoding and MTP

### How to use Speculative Decoding and MTP on vLLM Ascend

Please refer to [vLLM official doc on Speculative Decoding](https://docs.vllm.ai/en/latest/features/spec_decode.html) as a usage guide.

> [!NOTE]  
> When using Speculative Decoding and MTP, there are some limits on vllm-ascend compared with vllm:
> 
> 1. When request preemption is triggered, there exsists precision issue with Speculative Decoding, except for MTP.
> 2. Speculative Decoding with multi-step preparation on npu is not supported, only support replacing by circle `for` on cpu to simulate multi-step preparation.
> 3. Only BatchExpansionTop1Scorer is supported now, MQAScorer is not supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Guide]: Usage on Speculative Decoding and MTP #734

How to use Speculative Decoding and MTP on vLLM Ascend

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Guide]: Usage on Speculative Decoding and MTP #734

Description

How to use Speculative Decoding and MTP on vLLM Ascend

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions