Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KVCACHE] Improved schedule for prefill attention #17432

Merged
merged 2 commits into from
Oct 3, 2024

Commits on Oct 1, 2024

  1. [KVCACHE] Improved schedule for prefill attention

    Improvements
    
    Added Tranpose to K for better Vectorization during Matmul.
    Improved Load Schedule.
    Improved a bit more than 2x is most cases.
    Llama-2 7B observation
    -------kernel----------------baseline----------optimized-
    ---batch_prefill_ragged_kv----15 ms-------------7.1 ms
    krishnaraj36 committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    2f4c7fb View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. Update kv_cache.py

    krishnaraj36 authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    4515dcb View commit details
    Browse the repository at this point in the history