[Feature]: Implement Eagle3 Acceleration on vllm-ascend

### 🚀 The feature, motivation and pitch

## Description
The Eagle3 acceleration for GPU has been successfully implemented and merged in [this PR]((https://github.com/vllm-project/vllm/pull/16937). However, the NPU implementation is still missing. Eagle3 is currently the state-of-the-art (SOTA) acceleration technique, and its implementation on NPU would significantly enhance the performance and efficiency of our models running on NPU devices.


### Alternatives

Proposed Solution:
- Finish the draft model and forward on npu.
- Ensure draft model implementation is functional and meets the basic requirements.
- Ensure paged attention for draft model is optimized for NPU and performs efficiently.

### Additional context

- GPU Implementation: Completed and merged in [PR #16937](https://github.com/vllm-project/vllm/pull/16937).
- NPU Implementation: Not yet implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Implement Eagle3 Acceleration on vllm-ascend #1004

🚀 The feature, motivation and pitch

Description

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Implement Eagle3 Acceleration on vllm-ascend #1004

Description

🚀 The feature, motivation and pitch

Description

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions