[KVCACHE] Improved schedule for prefill attention #17432

Improvements Added Tranpose to K for better Vectorization during Matmul. Improved Load Schedule. Improved a bit more than 2x is most cases. Llama-2 7B observation -------kernel----------------baseline----------optimized- ---batch_prefill_ragged_kv----15 ms-------------7.1 ms

Commits on Oct 3, 2024

Update kv_cache.py

krishnaraj36 authored Oct 3, 2024

Configuration menu

View commit details

Copy full SHA for 4515dcb

Browse repository at this point

Copy the full SHA

4515dcb View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCACHE] Improved schedule for prefill attention #17432

[KVCACHE] Improved schedule for prefill attention #17432

Commits on Oct 1, 2024

Commits on Oct 3, 2024