Skip to content

Conversation

@nazanin-beheshti
Copy link
Contributor

Details:

Right now, number of tokens pre allocated for KV cache is set to 128. We add a reconfigurable parameter which we can set KV_CACHE_PREALLOCATION_SIZE to any value and pass in a json file to the app.

Tickets:

  • ticket-id

@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: GPU OpenVINO GPU plugin category: Python API OpenVINO Python bindings category: CPP API OpenVINO CPP API bindings labels Oct 30, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPP API OpenVINO CPP API bindings category: GPU OpenVINO GPU plugin category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings ExternalPR External contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants