Skip to content

[Feature]: Implement Eagle3 Acceleration on vllm-ascend #1004

@umeiko

Description

@umeiko

🚀 The feature, motivation and pitch

Description

The Eagle3 acceleration for GPU has been successfully implemented and merged in [this PR]((vllm-project/vllm#16937). However, the NPU implementation is still missing. Eagle3 is currently the state-of-the-art (SOTA) acceleration technique, and its implementation on NPU would significantly enhance the performance and efficiency of our models running on NPU devices.

Alternatives

Proposed Solution:

  • Finish the draft model and forward on npu.
  • Ensure draft model implementation is functional and meets the basic requirements.
  • Ensure paged attention for draft model is optimized for NPU and performs efficiently.

Additional context

  • GPU Implementation: Completed and merged in PR #16937.
  • NPU Implementation: Not yet implemented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions