Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently the functions to set the kv_cache will overwrite the data pointers of the k and v tensors, as the pointer address is stored in the memory block (kv_self.buf) itself and then overwritten by memcpy.
Restoring the cache only works correctly when restoring from the same runtime session as the data pointers will not have changed.
I saw folks testing the kv_cache get and set by freeing the kv_cache ggml context, then making a new context and restoring to that. Probably the same memory block was allocated in the second context, so that it did not segfault.
When storing cache to file, restarting program and loading cache the pointers will be wrong and llama_eval will segfault.
To fix the problem, I remember the data pointers before memcpy overwrites kv_self.buf and then just restore them.