You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Speculative Decoding and MTP, there are some limits on vllm-ascend compared with vllm:
When request preemption is triggered, there exsists precision issue with Speculative Decoding, except for MTP.
Speculative Decoding with multi-step preparation on npu is not supported, only support replacing by circle for on cpu to simulate multi-step preparation.
Only BatchExpansionTop1Scorer is supported now, MQAScorer is not supported.