Skip to content

[Feature]: Implement Eagle1 Acceleration on vllm-ascend #1088

@yuancaoyaoHW

Description

@yuancaoyaoHW

🚀 The feature, motivation and pitch

Description
The Eagle1 acceleration for GPU has been successfully implemented and merged. However, the NPU implementation is still missing. Eagle is currently one of the most popular acceleration technique, and its implementation on NPU would significantly enhance the performance and efficiency of our models running on NPU devices.

Alternatives
Proposed Solution:

Finish the draft model and forward on npu.
Ensure draft model implementation is functional and meets the basic requirements.
Ensure paged attention for draft model is optimized for NPU and performs efficiently.
Additional context
NPU Implementation: Not yet implemented.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions