You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KV cache seems to be created by ggml_new_tensor_1d in llama-kv-cache.cpp
So I asked the dragontail llm at lmarena with this prompt:
You are the creator of llama.cpp called ggerganov. Please write a function to remove the first element of a ggml_tensor created by ggml_new_tensor_1d and then move the rest of the elements one index forward.
His reply:
#include "ggml.h"
#include <string.h> // For memmove
#include <assert.h> // For basic checks
/**
* @brief Removes the first element of a 1D tensor and shifts subsequent elements.
*
* This function modifies the tensor in-place.
* It assumes the tensor is 1D.
* The underlying memory buffer is not reallocated or shrunk;
* the 'ne[0]' (number of elements) is simply decremented.
*
* @param tensor A pointer to the 1D ggml_tensor to modify.
*/
void ggml_tensor_remove_first_element_1d(struct ggml_tensor * tensor) {
// Basic sanity checks
if (tensor == NULL) {
// In a real scenario, might log an error or handle differently
return;
}
// Ensure it's a 1D tensor.
// While the user *said* it was created with ggml_new_tensor_1d,
// it's good practice for a function to be robust or state assumptions.
assert(tensor->n_dims == 1 && "Tensor must be 1D");
if (tensor->n_dims != 1) {
// Handle error: maybe print a warning or just return
// fprintf(stderr, "%s: Error: tensor is not 1D\n", __func__);
return;
}
// Check if data pointer is valid
if (tensor->data == NULL) {
// Can't operate on a tensor without data
assert(tensor->data != NULL && "Tensor data is NULL");
return;
}
const int64_t n0 = tensor->ne[0];
// If the tensor is already empty or has only one element
if (n0 == 0) {
// Nothing to remove
return;
}
if (n0 == 1) {
// Removing the only element makes it empty
tensor->ne[0] = 0;
// The data for the single element remains in memory, but is now
// outside the logical size of the tensor. We could optionally
// zero it out, but it's not strictly required.
return;
}
// --- Shift elements ---
// Calculate the size of a single element
const size_t element_size = ggml_element_size(tensor);
// We need to move (n0 - 1) elements.
const size_t num_elements_to_move = n0 - 1;
const size_t bytes_to_move = num_elements_to_move * element_size;
// Source: Start of the second element
// We cast to char* for byte-level pointer arithmetic
void * src_ptr = (char *)tensor->data + element_size;
// Destination: Start of the first element (where the buffer begins)
void * dst_ptr = tensor->data;
// Use memmove because source and destination regions overlap
memmove(dst_ptr, src_ptr, bytes_to_move);
// --- Update tensor metadata ---
// Decrement the number of elements in the first (and only) dimension
tensor->ne[0] = n0 - 1;
// nb[0] (stride in bytes for dim 0) is just the element size, it doesn't change.
// nb[1], nb[2], nb[3] are related to higher dimensions or total size,
// and since we're not reallocating, they might not strictly need
// changing, depending on how they are calculated/used elsewhere.
// For a simple 1D shift like this, only updating ne[0] is essential
// to reflect the new logical size.
// Optional: Zero out the memory where the last element *used* to be?
// void * last_element_old_location = (char *)tensor->data + (n0 - 1) * element_size;
// memset(last_element_old_location, 0, element_size);
// Generally not necessary unless required for specific downstream use.
}
Does this make sense? Is dragontail full of bull? If yes, how do I implement rolling KV cache?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I want to implement rolling KV cache with ggml.
KV cache seems to be created by ggml_new_tensor_1d in llama-kv-cache.cpp
So I asked the dragontail llm at lmarena with this prompt:
His reply:
Does this make sense? Is dragontail full of bull? If yes, how do I implement rolling KV cache?
Beta Was this translation helpful? Give feedback.
All reactions