Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support blocked KV cache for flash decoding #678

Closed
wants to merge 3 commits into from

Conversation

beginlner
Copy link
Contributor

@beginlner beginlner commented Nov 20, 2023

Restriction: block_size must match kBlockN in the kernel.

@tridao
Copy link
Contributor

tridao commented Jan 23, 2024

Hi @beginlner I have a commit implementing paged KV cache based on some on the ideas in this PR, can I add you as co-author of the commit?

@beginlner
Copy link
Contributor Author

Hi @beginlner I have a commit implementing paged KV cache based on some on the ideas in this PR, can I add you as co-author of the commit?

Hi @tridao Sure, I'd be happy to be added as a co-author on your commit.

@masahi
Copy link

masahi commented Feb 1, 2024

@tridao Is there a possibility to support smaller block sizes than 256?

@tridao
Copy link
Contributor

tridao commented Feb 1, 2024

@tridao Is there a possibility to support smaller block sizes than 256?

Sure, we just need someone interested in working on that.

@skrider
Copy link

skrider commented Feb 13, 2024

@tridao I took a stab at it here: #824 . Would greatly appreciate a review if you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants