[KVCACHE] Improved schedule for prefill attention #17432

krishnaraj36 · 2024-10-01T09:42:38Z

Improvements

Added Tranpose to K for better Vectorization during Matmul.
Improved Load Schedule.
Improved a bit more than 2x is most cases.
Llama-2 7B observation
-----------kernel----------------baseline----------optimized-
---batch_prefill_ragged_kv------15 ms-------------7.1 ms

Improvements Added Tranpose to K for better Vectorization during Matmul. Improved Load Schedule. Improved a bit more than 2x is most cases. Llama-2 7B observation -------kernel----------------baseline----------optimized- ---batch_prefill_ragged_kv----15 ms-------------7.1 ms

This reverts commit 79abc03.

Revert "[KVCACHE] Improved schedule for prefill attention (#17432)" This reverts commit 79abc03.

tqchen approved these changes Oct 1, 2024

View reviewed changes

Update kv_cache.py

4515dcb

tqchen merged commit 79abc03 into apache:main Oct 3, 2024
14 checks passed

MasterJH5574 added a commit that referenced this pull request Oct 14, 2024

Revert "[KVCACHE] Improved schedule for prefill attention (#17432)"

16cdb7c

This reverts commit 79abc03.

MasterJH5574 mentioned this pull request Oct 14, 2024

Revert "[KVCACHE] Improved schedule for prefill attention" #17466

Merged

tqchen pushed a commit that referenced this pull request Oct 15, 2024

Revert "[KVCACHE] Improved schedule for prefill attention" (#17466)

0c67cd8

Revert "[KVCACHE] Improved schedule for prefill attention (#17432)" This reverts commit 79abc03.

ysh329 mentioned this pull request Oct 16, 2024

[Release] v0.18.0 Release Candidate Notes #17468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCACHE] Improved schedule for prefill attention #17432

[KVCACHE] Improved schedule for prefill attention #17432

krishnaraj36 commented Oct 1, 2024

[KVCACHE] Improved schedule for prefill attention #17432

[KVCACHE] Improved schedule for prefill attention #17432

Conversation

krishnaraj36 commented Oct 1, 2024