[RFC]: KV-Cache Interoperability API Standardization

### Motivation

This RFC proposes a KV-Cache Interoperability API, covering standardized notification events (via KVEvents) and reproducible prefix-block hashing. These standards aim to support cross-system cache awareness, observability, and future tooling for indexing, routing, and diagnostics.

vLLM already ships with internal [KVEvents](https://github.com/vllm-project/vllm/issues/16669) contributed by the NVIDIA Dynamo team - that’s a strong foundation. 
But as external systems aim for cache-aware inference, we need to treat these internal mechanisms as public contracts to support broader adoption and interop.

### Goals

1. **KVEvents Internal API as a Public Contract**  
   The KVEvents schema is already well-defined in vLLM and used internally by the `KVCacheManager` for GPU cache events. It’s also being extended to CPU offloading via the `KVConnector` (see [#19854](https://github.com/vllm-project/vllm/issues/19854)).  
   This RFC proposes formalizing KVEvents as the public contract for any component emitting or consuming KV-Cache lifecycle events - including external indexers, routers, and engines.

2. **Ensure Reproducible Block Hashing Across Languages**  
   Prefix cache block keys must be computed the same way across runtimes (e.g., Python, Go). This requires:
    - Canonical serialization (e.g., CBOR)
    - Consistent hashing algorithms (e.g., SHA256, xxHash)
    - Defined structure for input objects (e.g., token arrays, `extra_keys`)
    - Explicit rules for special cases like `NONE_HASH` root
    - Alignment on security features such as per-request hash-salting 
   
    **Disclaimer**: in the current KVEvents schema, the token-ids are sent along their block-hashes, which makes external indexing possible through mapping tokens -> different-hashes -> vLLM-hashes. While this avoids introducing reproducible hashing and configuration syncs, it requires complex indexing and lookups, along with the networking overhead of passing the 32bit token-ids in every event.

3. **Enable Language-Agnostic Interop**  
   Develop shared guidance and reference libraries in Python, Go, and other widely used languages. These utilities do not need to reside within vLLM, but should remain consistent with its specifications.


### Proposed Change

This RFC proposes standardizing two core aspects of KV-Cache awareness:

#### 1. KVEvents Schema

- Reuse the existing KVEvents format used internally in vLLM as a versioned public interface for any KV-Cache publisher or consumer
- Consider light refactoring:
    - Use `bytes` for hashes instead of Python-native `int`
    - Reduce required fields where appropriate

#### 2. Prefix Block Hashing

- Use CBOR (canonical mode) for serializing token arrays and metadata
  - Other canonical algorithms are welcome. Today serialization is coupled with Python.
- Support multiple standard hash functions (`SHA256`, `xxHash`)
- Gradually migrate to defaulting the non-language-coupled options

  PR on first two points:
    - #20511 

These changes will support consistent block identity and event interpretation across runtimes, enabling robust interop between cache indexers, and routing layers.

### CC List

@robertgshaw2-redhat @njhill @YaoJiayi @dannyharnik @orozery 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: KV-Cache Interoperability API Standardization #20492

Motivation

Goals

Proposed Change

1. KVEvents Schema

2. Prefix Block Hashing

CC List

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: KV-Cache Interoperability API Standardization #20492

Description

Motivation

Goals

Proposed Change

1. KVEvents Schema

2. Prefix Block Hashing

CC List

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions