Skip to content

There is nothing to working example for kv_cache usage. #1054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edp1096 opened this issue Apr 19, 2023 · 5 comments
Closed

There is nothing to working example for kv_cache usage. #1054

edp1096 opened this issue Apr 19, 2023 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed high priority Very important issue

Comments

@edp1096
Copy link
Contributor

edp1096 commented Apr 19, 2023

Hello,

During several hours,
I tried @abetlen 's codes in #730 even followed @chrfalch's guide , but work nothing.

Is there nothing to working example for #685?

@SlyEcho
Copy link
Collaborator

SlyEcho commented Apr 19, 2023

The KV cache data right now is not quite usable, since it contains all the bytes in the buffer, including the ggml structs and pointers of the running program.

It would be better to provide access to the K and V parts of the data using properly shaped tensors, instead.

I did some hacking on this and it was interesting to see the values when exported using matplotlib in Python. But I don't know if the code is acceptable, I have to clean it up a bit.

@ggerganov
Copy link
Member

The KV cache data right now is not quite usable, since it contains all the bytes in the buffer, including the ggml structs and pointers of the running program.

It would be better to provide access to the K and V parts of the data using properly shaped tensors, instead.

Ah sorry about that - this is a mistake. I've overlooked this and we should fix it.
Was wondering why people couldn't make it work yet

@ggerganov ggerganov added bug Something isn't working help wanted Extra attention is needed high priority Very important issue labels Apr 19, 2023
@SlyEcho
Copy link
Collaborator

SlyEcho commented Apr 19, 2023

I had the idea of extracting KV info per context item. That would mean that after every generated token, it would be possible to get its KV data, and it wouldn't matter if the context size had changed or not. So you could eval a sequence using n_ctx 128, cache the KV, then when you run again, restore the cache even with n_ctx for example 1024.

The shape of the K data is [n_layer, n_ctx, n_embd], while V is [n_layer, n_embd, n_ctx] (Numpy notation).
And my idea was to get data for the N'th item: [n_layer, n_embd] (K&V both)
Or perhaps for a range: [n_layer, 0:n_past, n_embd], or maybe just the same layout: K: [n_layer, 0:n_past, n_embd] V: [n_layer, n_embd, 0:n_past]

Or just whatever is there in the raw form and let the API user deal with it.

@edp1096
Copy link
Contributor Author

edp1096 commented Apr 19, 2023

@SlyEcho First, sorry for lack knowledge of c/cpp.
I tried transfer almost possible variables of context - except a method and also with kv data, to second destination context but got same result.
What if sample code or pseudo code exist, would better.
Thanks.

@ggerganov Thank you for providing a great project.
I noticed you'v reopened #730 again.
It seems that both are covered the same issue so, it is ok to close this.

@edp1096
Copy link
Contributor Author

edp1096 commented Apr 21, 2023

It doesn't seem necessary to keep open duplicated issue. closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed high priority Very important issue
Projects
None yet
Development

No branches or pull requests

3 participants