Skip to content

[Guide]: Usage on Speculative Decoding and MTP #734

@MengqingCao

Description

@MengqingCao

How to use Speculative Decoding and MTP on vLLM Ascend

Please refer to vLLM official doc on Speculative Decoding as a usage guide.

Note

When using Speculative Decoding and MTP, there are some limits on vllm-ascend compared with vllm:

  1. When request preemption is triggered, there exsists precision issue with Speculative Decoding, except for MTP.
  2. Speculative Decoding with multi-step preparation on npu is not supported, only support replacing by circle for on cpu to simulate multi-step preparation.
  3. Only BatchExpansionTop1Scorer is supported now, MQAScorer is not supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    guideguide note

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions