-
Notifications
You must be signed in to change notification settings - Fork 11.4k
There is nothing to working example for kv_cache usage. #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The KV cache data right now is not quite usable, since it contains all the bytes in the buffer, including the ggml structs and pointers of the running program. It would be better to provide access to the K and V parts of the data using properly shaped tensors, instead. I did some hacking on this and it was interesting to see the values when exported using matplotlib in Python. But I don't know if the code is acceptable, I have to clean it up a bit. |
Ah sorry about that - this is a mistake. I've overlooked this and we should fix it. |
I had the idea of extracting KV info per context item. That would mean that after every generated token, it would be possible to get its KV data, and it wouldn't matter if the context size had changed or not. So you could eval a sequence using n_ctx 128, cache the KV, then when you run again, restore the cache even with n_ctx for example 1024. The shape of the K data is Or just whatever is there in the raw form and let the API user deal with it. |
@SlyEcho First, sorry for lack knowledge of c/cpp. @ggerganov Thank you for providing a great project. |
It doesn't seem necessary to keep open duplicated issue. closing. |
Hello,
During several hours,
I tried @abetlen 's codes in #730 even followed @chrfalch's guide , but work nothing.
Is there nothing to working example for #685?
The text was updated successfully, but these errors were encountered: