-
Notifications
You must be signed in to change notification settings - Fork 533
Open
Labels
Description
🚀 The feature, motivation and pitch
Description
The Eagle1 acceleration for GPU has been successfully implemented and merged. However, the NPU implementation is still missing. Eagle is currently one of the most popular acceleration technique, and its implementation on NPU would significantly enhance the performance and efficiency of our models running on NPU devices.
Alternatives
Proposed Solution:
Finish the draft model and forward on npu.
Ensure draft model implementation is functional and meets the basic requirements.
Ensure paged attention for draft model is optimized for NPU and performs efficiently.
Additional context
NPU Implementation: Not yet implemented.
Alternatives
No response
Additional context
No response