[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) #20511

vMaroon · 2025-07-05T14:42:56Z

Purpose

As part of [RFC]: KV-Cache Management Standardization for Interop #20492 and to support the development of llm-d's vLLM-native global KV-Cache indexer, prefix-cache block hashing must be reproducible. Given the same prefix-cache key input and configuration, the block hash should be reproducible - with no constraints over technical stack choices.

This PR introduces:

A new block hashing function: sha256_cbor, which serializes input objects using canonical CBOR (via cbor2) and hashes them with SHA-256
- The result is truncated to 64 bits to match the current KVEvents schema, which does not yet support full 256-bit hash keys
  - Regardless, 64 bits provide extremely low collision odds for practical KV-cache sizes (e.g., 1M tokens cache with 16-tokens chunking -> ~62k blocks -> ~1 in 10 billion collision rate using the birthday bound, while keeping KVEvent traffic bandwidth compact
A change to the global NONE_HASH initialization logic to use the configured hash function

These changes make the prefix hashing logic reproducible, non-language-specific, and aligned with future cross-system KV-Cache interoperability goals outlined in the RFC.

Test Plan

The relevant test files were updated, there is no need for new ones:

tests/v1/core/test_kv_cache_utils.py
tests/v1/core/test_prefix_caching.py

Profiling

The total difference hashing a 50k tokens request (block size 16) is negligible.

=== System Information ===
Platform: macOS-15.5-arm64-arm-64bit-Mach-O
Processor: arm
Python version: 3.13.5
CPU count: 8
RAM: 32.0 GB
=========================

=== Hash Function Profiling Summary ===
AI workload equivalent per run: 50,000 tokens processed
Profiling config: 1000 runs, 3125 blocks/run, block_size=16
---------------------------------------
hash: mean=0.0012s, std=0.0020s
    Mean time per token: 0.00000002s
sha256: mean=0.0054s, std=0.0003s
    Mean time per token: 0.00000011s
sha256_cbor_64bit: mean=0.0171s, std=0.0044s
    Mean time per token: 0.00000034s
---------------------------------------
Comparison (relative slowdown, higher is slower):
    hash: 1.00x (baseline) mean diff: (0.0000s per 50,000 tokens, 0.00000000s per token)
    sha256: 4.62x  mean diff: (+0.0042s per 50,000 tokens, +0.00000008s per token)
    sha256_cbor_64bit: 14.73x  mean diff: (+0.0159s per 50,000 tokens, +0.00000032s per token)
=======================================

code: https://pastebin.com/7thahB9Y

Test Results

All updated tests pass.

github-actions · 2025-07-05T14:43:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @vMaroon, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reproducibility and cross-system compatibility of prefix-cache block hashing. It achieves this by introducing a new sha256_cbor hashing algorithm that leverages canonical CBOR serialization, and by making the initial hash seed (NONE_HASH) configurable to use the selected hashing function. These changes are crucial for supporting future global KV-Cache interoperability goals.

Highlights

New Hashing Algorithm for Prefix Caching: Introduced a new sha256_cbor hashing function for prefix-cache block hashing. This function serializes input objects using canonical CBOR (via the cbor2 library) and then hashes them with SHA-256. The resulting 256-bit hash is truncated to 64 bits to align with the current KVEvents schema, ensuring reproducibility across different systems and programming languages.
Configurable NONE_HASH Initialization: The global NONE_HASH initialization logic has been refactored into a new init_none_hash function. This allows the initial hash seed for prefix block sequences to be generated using the currently configured hashing algorithm (e.g., sha256_cbor, sha256, or Python's hash), making the system more flexible and consistent.
Dependency and Configuration Updates: The cbor2 library has been added as a new dependency in requirements/common.txt. Additionally, the PrefixCachingHashAlgo literal type in vllm/config.py has been updated to include the new sha256_cbor option, along with an expanded docstring explaining its behavior and benefits.
Expanded Test Coverage: Existing unit tests in tests/v1/core/test_kv_cache_utils.py and tests/v1/core/test_prefix_caching.py have been updated. They now include sha256_cbor in their parametrization, ensuring that the new hashing algorithm is thoroughly tested across various scenarios, including NONE_HASH initialization, block hashing, and request token hashing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new reproducible hashing mechanism using CBOR and SHA-256 for prefix caching. My review focuses on the potential risks of hash truncation, an inconsistency in NONE_HASH initialization, and missing test setup calls that could lead to flaky tests.

vllm/utils/__init__.py

vllm/v1/core/kv_cache_utils.py

tests/v1/core/test_kv_cache_utils.py

vllm/config.py

vMaroon · 2025-07-05T15:16:43Z

The force push adds sign-off per DCO and addresses gemini's suggestions.

yinghai

do you see any performance difference with change of hash function? What if there is mm content?

yinghai · 2025-07-06T03:45:57Z

vllm/utils/__init__.py

Have you tested that it's generates the same hash on other language e.g. rust?

This was tested in Golang, see this llm-d-kv-cache-manager PR that reproduces these hashes independently.

Any implementation of the canonical specs should be fine, while also paying attention to endianness.

vMaroon · 2025-07-06T08:16:24Z

@yinghai I haven't profiled the hashing/serialization (this will follow later on across a different work-stream), but note that:

SHA256 is already supported as a hashing algorithm in vLLM
CBOR for serialization is widely used and is comparable to MessagePack in speed according to some blogs

I think vLLM should have a few more canonical options for both serialization and hashing, but this combination is a good kickoff. Once I have profiling data and benchmarks I will share, though I don't think it is a blocker considering that:

The effect of this PR on vLLM configurations that do not use the new algorithm are in the calculation of NONE_HASH, which no longer uses sha256 if the builtin algorithm is chosen
- The reasoning is that I believe it is cleaner this way, plus it seems like that person that added sha256 as an option and made it so that NONE_HASH always uses sha256 is mostly interested in that configuration
Such a feature is inevitable if considering the mentioned RFC ([RFC]: KV-Cache Interoperability API Standardization #20492)

orozery · 2025-07-08T07:38:11Z

Before this PR, and still after this PR, the hashing specification seems very cumbersome to me.

Why is the input for hashing a list[Any]? (cc @comaniac)
I understand that it's easiest to simply not care and throw this to some general serializer (pickle before this PR, cbor2 after).
But on the other hand:

This still keeps it complex to match with external implementations (like llm-d), and to keep sync between implementations.
It may be more efficient to serialize if you know how your input looks like.

I would prefer to align the input to list[int], where we know the int size (I think it should be determined by vllm_config.model_config.get_vocab_size()), and then simply use struct.pack.
I also think we should keep the block hashes as bytes (instead of int) to save serialization time for computing hashes for preceding blocks.

Another thing I don't understand is how does this work given that we have a non-deterministic initial NONE_HASH. Why do we need a non-deterministic initialization in the first place?

vMaroon · 2025-07-08T09:20:08Z

Before this PR, and still after this PR, the hashing specification seems very cumbersome to me.

Why is the input for hashing a list[Any]? (cc @comaniac) I understand that it's easiest to simply not care and throw this to some general serializer (pickle before this PR, cbor2 after). But on the other hand:

This still keeps it complex to match with external implementations (like llm-d), and to keep sync between implementations.

It may be more efficient to serialize if you know how your input looks like.

I would prefer to align the input to list[int], where we know the int size (I think it should be determined by vllm_config.model_config.get_vocab_size()), and then simply use struct.pack. I also think we should keep the block hashes as bytes (instead of int) to save serialization time for computing hashes for preceding blocks.

Another thing I don't understand is how does this work given that we have a non-deterministic initial NONE_HASH. Why do we need a non-deterministic initialization in the first place?

Thanks Or @orozery, I agree with all these points. I wanted to limit the scope of this PR to adding a reproducible hashing algorithm and follow-up with minor refactoring of the hashing functions and KVEvents schema:

On input typing to the hash function:
```
def hash_block_tokens(
        hash_function: Callable,
        parent_block_hash: Optional[int],
        curr_block_token_ids: Sequence[int],
        extra_keys: Optional[tuple[Any, ...]] = None) -> BlockHash
...
hash_function((parent_block_hash, curr_block_token_ids_tuple, extra_keys))
```
extra_keys should be explicitly typed and the whole input explicitly defined for robustness and clarity.
This is part of the mentioned RFC in goals:
1. Ensure Reproducible Block Hashing Across Languages:
  - Defined structure for input objects (e.g., token arrays, extra_keys)
NONE_HASH is non-deterministic if you do not explicitly set PYTHONHASHSEED. This is a common practice to setting the builtin hash function's seed, but I think that an env var with a better name can be introduced
KVEvents block_hashes type must also change to bytes to allow for >64bit hashes to go through (Msgpack over int limits to 64bits)

njhill · 2025-07-08T13:53:17Z

I think the primary consideration is performance. This is the reason that sha256 wasn't made the default - see discussion in #15297.

cc @comaniac @dr75 @heheda12345

dr75 · 2025-07-08T14:40:31Z

I think the primary consideration is performance. This is the reason that sha256 wasn't made the default - see discussion in #15297.

cc @comaniac @dr75 @heheda12345

Yes, it wasn't made the default because of performance (in my opinion the impact is small enough to accept sha256; see measurements made with 50k token context in #15297).

The reason for non-determinism is to prevent exploitation of hash collisions when using hash() (which was there before sha256) to extract context of other users in a multi-tenant env. See #12621. @russellb

With sha256 that should not be an issue. With sha256 and using only 64 bit it might be a problem I guess.

vMaroon · 2025-07-08T14:47:14Z

I think we should have better configurability of serializers, hashers and trimming if relevant than this PR introduces but I expect such work to require a larger time-window for acceptance.

The trimming here was done because it remains a valid algorithm while avoiding a bug in KVEvents - my immediate interest in this work - without changing the schema (also for the scoping/E2E time-investment considerations).

Migrating to canonical and reproducible algorithms as defaults is part of the RFC but is not of urgency.

vMaroon · 2025-07-12T14:59:45Z

The force push rebases on main.

tlrmchlsmth

LGTM

heheda12345 · 2025-07-13T15:02:28Z

Can you import cbor2 in sha256_cbor_64bit function to fix the doc build failure? https://app.readthedocs.org/projects/vllm/builds/28834408/#278229768--52

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

vMaroon · 2025-07-13T20:04:42Z

@heheda12345 one test is failing due to what seems to be unrelated https://buildkite.com/vllm/ci/builds/23841/steps/canvas?jid=01980492-5651-41e1-8744-608d2d208015#01980492-5651-41e1-8744-608d2d208015/212-4568 - is that the case?

heheda12345 · 2025-07-14T02:16:05Z

Retrying.

…256 + CBOR (64bit)vllm-project#20511 Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: x22x22 <wadeking@qq.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

vMaroon requested review from WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, youkaichao and ywang96 as code owners July 5, 2025 14:42

gemini-code-assist bot reviewed Jul 5, 2025

View reviewed changes

mergify bot added ci/build v1 labels Jul 5, 2025

vMaroon mentioned this pull request Jul 5, 2025

[KV-Events] KV-Events Processing - Part 3 of 3 llm-d/llm-d-kv-cache-manager#44

Merged

6 tasks

gemini-code-assist bot reviewed Jul 5, 2025

View reviewed changes

vMaroon force-pushed the main branch from b1b439f to fe007df Compare July 5, 2025 15:13

vMaroon mentioned this pull request Jul 5, 2025

[RFC]: KV-Cache Interoperability API Standardization #20492

Open

1 task

yinghai reviewed Jul 6, 2025

View reviewed changes

njhill requested a review from heheda12345 July 8, 2025 13:29

vMaroon force-pushed the main branch from f8f33ac to 15c6024 Compare July 9, 2025 11:20

vMaroon force-pushed the main branch from 902f641 to e493a8c Compare July 12, 2025 14:59

tlrmchlsmth approved these changes Jul 12, 2025

View reviewed changes

fix mkdocs build

2c0d381

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

vMaroon mentioned this pull request Jul 13, 2025

[#44 Followup] Change helm-chart vLLM image to upstream llm-d/llm-d-kv-cache-manager#51

Closed

heheda12345 enabled auto-merge (squash) July 14, 2025 02:22

heheda12345 merged commit 66f6fbd into vllm-project:main Jul 14, 2025
99 checks passed

vMaroon mentioned this pull request Jul 24, 2025

Prefix cache tracking scores are coming as scores: {} llm-d-incubation/llm-d-infra#86

Closed

hsubramony pushed a commit to HabanaAI/vllm-fork that referenced this pull request Jul 24, 2025

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-…

fcb7dc2

…256 + CBOR (64bit)vllm-project#20511 Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

This was referenced Jul 24, 2025

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) #21552

Closed

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) HabanaAI/vllm-fork#1651

Closed

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-…

82e6178

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-…

c87a98b

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-…

c04c368

…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

vMaroon mentioned this pull request Aug 31, 2025

[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead #23673

Merged

5 tasks

Uh oh!

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) #20511

[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) #20511

Conversation

vMaroon commented Jul 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Profiling

Test Results

Uh oh!

github-actions bot commented Jul 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vMaroon commented Jul 5, 2025

Uh oh!

yinghai left a comment

Choose a reason for hiding this comment

Uh oh!

yinghai Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orozery commented Jul 8, 2025

Uh oh!

vMaroon commented Jul 8, 2025

Uh oh!

njhill commented Jul 8, 2025

Uh oh!

dr75 commented Jul 8, 2025

Uh oh!

vMaroon commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vMaroon commented Jul 12, 2025

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

heheda12345 commented Jul 13, 2025

Uh oh!

vMaroon commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heheda12345 commented Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

vMaroon commented Jul 5, 2025 •

edited by github-actions bot

Loading

vMaroon Jul 6, 2025 •

edited

Loading

vMaroon commented Jul 6, 2025 •

edited

Loading

vMaroon commented Jul 8, 2025 •

edited

Loading

vMaroon commented Jul 13, 2025 •

edited

Loading