-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) #20511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @vMaroon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the reproducibility and cross-system compatibility of prefix-cache block hashing. It achieves this by introducing a new sha256_cbor
hashing algorithm that leverages canonical CBOR serialization, and by making the initial hash seed (NONE_HASH
) configurable to use the selected hashing function. These changes are crucial for supporting future global KV-Cache interoperability goals.
Highlights
- New Hashing Algorithm for Prefix Caching: Introduced a new
sha256_cbor
hashing function for prefix-cache block hashing. This function serializes input objects using canonical CBOR (via thecbor2
library) and then hashes them with SHA-256. The resulting 256-bit hash is truncated to 64 bits to align with the current KVEvents schema, ensuring reproducibility across different systems and programming languages. - Configurable
NONE_HASH
Initialization: The globalNONE_HASH
initialization logic has been refactored into a newinit_none_hash
function. This allows the initial hash seed for prefix block sequences to be generated using the currently configured hashing algorithm (e.g.,sha256_cbor
,sha256
, or Python'shash
), making the system more flexible and consistent. - Dependency and Configuration Updates: The
cbor2
library has been added as a new dependency inrequirements/common.txt
. Additionally, thePrefixCachingHashAlgo
literal type invllm/config.py
has been updated to include the newsha256_cbor
option, along with an expanded docstring explaining its behavior and benefits. - Expanded Test Coverage: Existing unit tests in
tests/v1/core/test_kv_cache_utils.py
andtests/v1/core/test_prefix_caching.py
have been updated. They now includesha256_cbor
in their parametrization, ensuring that the new hashing algorithm is thoroughly tested across various scenarios, includingNONE_HASH
initialization, block hashing, and request token hashing.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new reproducible hashing mechanism using CBOR and SHA-256 for prefix caching. My review focuses on the potential risks of hash truncation, an inconsistency in NONE_HASH
initialization, and missing test setup calls that could lead to flaky tests.
The force push adds sign-off per DCO and addresses gemini's suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see any performance difference with change of hash function? What if there is mm content?
vllm/utils/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested that it's generates the same hash on other language e.g. rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was tested in Golang, see this llm-d-kv-cache-manager
PR that reproduces these hashes independently.
Any implementation of the canonical specs should be fine, while also paying attention to endianness.
@yinghai I haven't profiled the hashing/serialization (this will follow later on across a different work-stream), but note that:
I think vLLM should have a few more canonical options for both serialization and hashing, but this combination is a good kickoff. Once I have profiling data and benchmarks I will share, though I don't think it is a blocker considering that:
|
Before this PR, and still after this PR, the hashing specification seems very cumbersome to me. Why is the input for hashing a
I would prefer to align the input to Another thing I don't understand is how does this work given that we have a non-deterministic initial |
Thanks Or @orozery, I agree with all these points. I wanted to limit the scope of this PR to adding a reproducible hashing algorithm and follow-up with minor refactoring of the hashing functions and KVEvents schema:
|
I think the primary consideration is performance. This is the reason that |
Yes, it wasn't made the default because of performance (in my opinion the impact is small enough to accept sha256; see measurements made with 50k token context in #15297). The reason for non-determinism is to prevent exploitation of hash collisions when using With sha256 that should not be an issue. With sha256 and using only 64 bit it might be a problem I guess. |
I think we should have better configurability of serializers, hashers and trimming if relevant than this PR introduces but I expect such work to require a larger time-window for acceptance. The trimming here was done because it remains a valid algorithm while avoiding a bug in KVEvents - my immediate interest in this work - without changing the schema (also for the scoping/E2E time-investment considerations). Migrating to canonical and reproducible algorithms as defaults is part of the RFC but is not of urgency. |
The force push rebases on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can you import cbor2 in sha256_cbor_64bit function to fix the doc build failure? https://app.readthedocs.org/projects/vllm/builds/28834408/#278229768--52 |
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
@heheda12345 one test is failing due to what seems to be unrelated https://buildkite.com/vllm/ci/builds/23841/steps/canvas?jid=01980492-5651-41e1-8744-608d2d208015#01980492-5651-41e1-8744-608d2d208015/212-4568 - is that the case? |
Retrying. |
…256 + CBOR (64bit)vllm-project#20511 Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: x22x22 <wadeking@qq.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…256 + CBOR (64bit) (vllm-project#20511) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Purpose
As part of [RFC]: KV-Cache Management Standardization for Interop #20492 and to support the development of
llm-d
's vLLM-native global KV-Cache indexer, prefix-cache block hashing must be reproducible. Given the same prefix-cache key input and configuration, the block hash should be reproducible - with no constraints over technical stack choices.This PR introduces:
sha256_cbor
, which serializes input objects using canonical CBOR (viacbor2
) and hashes them with SHA-256NONE_HASH
initialization logic to use the configured hash functionThese changes make the prefix hashing logic reproducible, non-language-specific, and aligned with future cross-system KV-Cache interoperability goals outlined in the RFC.
Test Plan
The relevant test files were updated, there is no need for new ones:
tests/v1/core/test_kv_cache_utils.py
tests/v1/core/test_prefix_caching.py
Profiling
The total difference hashing a 50k tokens request (block size 16) is negligible.
code: https://pastebin.com/7thahB9Y
Test Results
All updated tests pass.