We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance comparison?
IGEMM - Contiguous Loads PR
- Vectorized reads/writes
- Land Evoformer PR on main - Address comments on PR - Performance evaluation of Evoformer - Flash Decoding kernel with and without paged attention
How does PGR2 fit into the big picture?
Establishing a target reference kernel for Wave
Ivan - Attention dynamic index broadcast
Harsh
Stan
Ivan
Finished implementation of multi-buffering & performance evaluations 5-10% Performance Gain
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Milestones
IGEMM
FlashDecoding
EvoFormer
Multi-Buffering
Benchmarking
De-Prioritized
Week 1
Performance comparison?
Proposed plan at how you would implement this in Wave
IGEMM
- Contiguous Loads PR- Vectorized reads/writes- Land Evoformer PR on main- Address comments on PR- Performance evaluation of Evoformer- Flash Decoding kernel with and without paged attentionHow does PGR2 fit into the big picture?
Week 2
Establishing a target reference kernel for Wave
==========================
Ivan
- Attention dynamic index broadcastHarsh
Stan
Week 3
Ivan
Stan
Harsh
Week 4
Finished implementation of multi-buffering & performance evaluations
5-10% Performance Gain
The text was updated successfully, but these errors were encountered: