[TPU]Fix KV cache sharing tests #19371

lsy323 · 2025-06-09T17:31:37Z

No description provided.

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Signed-off-by: Siyuan Liu <lsiyuan@google.com> clean up Signed-off-by: Siyuan Liu <lsiyuan@google.com> clean up Signed-off-by: Siyuan Liu <lsiyuan@google.com>

github-actions · 2025-06-09T17:31:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @lsy323, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here to provide a summary of this pull request. This PR focuses on fixing and enabling several tests within the TPU model runner specifically related to KV cache sharing. The main goal is to ensure the tests accurately validate the logic for initializing and estimating KV cache size and maximum context length when KV cache sharing is configured, addressing issues that caused these tests to be skipped previously.

Highlights

Test Unskipping: Several tests (test_init_kv_cache_with_kv_sharing_invalid_target_layer_order, test_init_kv_cache_with_kv_sharing_target_layer_not_exist, test_init_kv_cache_with_kv_sharing_target_same_as_current, test_init_kv_cache_without_kv_sharing, test_init_kv_cache_with_kv_sharing_valid) that were previously skipped on TPU have been unskipped.
Test Setup Refactoring: The test setup in test_tpu_model_runner.py has been refactored. Module-level patching and cleanup fixtures for torch_xla and PallasAttentionBackend have been removed. Helper functions get_vllm_config and get_model_runner were introduced to standardize the creation of VLLM config and model runner instances within tests.
KV Cache Calculation Corrections: The calculations and assertions within the KV cache initialization tests have been updated. This includes correcting the assumed page size for KV cache tensors (from 32KB to 512KB) and adjusting the expected number of blocks and maximum context length estimations accordingly for both shared and non-shared KV cache scenarios.

Changelog

Click here to see the changelog

tests/v1/tpu/worker/test_tpu_model_runner.py
- Removed unused import unittest.mock.
- Removed module-level patching of torch_xla and PallasAttentionBackend.
- Removed cleanup_patches fixture.
- Introduced get_vllm_config and get_model_runner helper functions for test setup.
- Modified the model_runner pytest fixture to use the new helper functions.
- Unskipped five tests related to KV cache sharing logic by removing @pytest.mark.skip.
- Updated the unskipped tests to correctly use the model_runner fixture or explicitly create config/runner instances.
- Added set_current_vllm_config context manager to relevant tests.
- Corrected expected page size, block counts, and max context length calculations in KV cache tests.
- Updated access method for KV cache tensors in assertions from dictionary lookup to list indexing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request focuses on fixing KV cache sharing tests for TPUs. The main changes involve removing global mocks, refactoring test fixtures, and updating test assertions to reflect new KV cache sizing calculations. The use of set_current_vllm_config seems to be a key part of the fix, ensuring that Attention layers are instantiated with the correct configuration context within the tests.

Overall, the changes look good and seem to make the tests runnable and more accurate. The updated comments explaining cache calculations are particularly helpful. There are a couple of minor points for clarification and cleanup.

Summary of Findings

Mocking Strategy: Global mocks for torch_xla and PallasAttentionBackend were removed. This is good if the test environment guarantees their availability, but clarification on this assumption would be helpful.
Test Logic and Clarity: The unskipping of tests and the use of set_current_vllm_config are positive changes. The updates to KV cache size calculations and assertions, along with improved comments, enhance test accuracy and readability.
Minor Cleanups: An outdated comment in the model_runner fixture and a couple of unused variables (vllm_ctx) were noted.

Merge Readiness

The pull request significantly improves the TPU KV cache sharing tests by making them runnable and updating their assertions. The refactoring of fixtures and removal of potentially problematic global mocks are good steps.

With minor clarifications on the mocking strategy and a few small cleanups (outdated comment, unused variables), this PR should be in good shape for merging. I am unable to approve the pull request, so please have other reviewers take a look and approve before merging.

gemini-code-assist · 2025-06-09T17:33:33Z

tests/v1/tpu/worker/test_tpu_model_runner.py

+    # page size for each layer KV can be calculated as
+    # 2 (non-MLA) * 8 (num_heads) * 128 (head_dim)
+    # * 2 (bfloat16, kv_cache dtype) * 128 (block_size) = 512KB


The updated comment explaining the page size calculation is much clearer and more detailed. This is a great improvement for understanding the test's assumptions!

gemini-code-assist · 2025-06-09T17:33:34Z

tests/v1/tpu/worker/test_tpu_model_runner.py

+    for kv_cache_tensor in kv_cache_config.kv_cache_tensors:
+        kv_cache_tensor.size = (
+            kv_cache_spec[kv_cache_tensor.shared_by[0]].page_size_bytes)


The change from iterating kv_cache_config.tensors (which was a dict) to kv_cache_config.kv_cache_tensors (which is a list) and using kv_cache_tensor.shared_by[0] to get the layer name for kv_cache_spec looks correct given the structure of KVCacheConfig and KVCacheTensor. This aligns well with how KV cache tensors can be defined and potentially shared.

sarckk

LGTM! thanks

lsy323 · 2025-06-09T22:33:19Z

Thanks so much for @sarckk's work in #19155. This fix is based on #19155, I just happen to have access to TPUs to verify the fix and sharpen the edges in the original PR.

lsy323 · 2025-06-09T22:33:51Z

The tpu_runner tests is passing in CI https://buildkite.com/vllm/ci/builds/21712#019755c0-7bb1-43d1-b2f0-d836658ecc01 run

lsy323 · 2025-06-09T22:34:14Z

Hi @mgoin, could you please take a look at this fix? This should fix the tpu model runner tests. Thanks!

mgoin

Great improvements here :)

sarckk and others added 8 commits June 6, 2025 11:27

Fix TPU kv sharing tests

410c452

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Fix more tests

dfd8276

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

fix lint

50deb49

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Do not skip kv sharing tpu tests

a7b2c32

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

fix test (tpu worker uses attn dtype instead of kv dtype)

65fc127

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

fix tpu test with bfloat16 and page_size=128

9107b51

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

fix tests

ef53a49

Signed-off-by: Siyuan Liu <lsiyuan@google.com> clean up Signed-off-by: Siyuan Liu <lsiyuan@google.com> clean up Signed-off-by: Siyuan Liu <lsiyuan@google.com>

Merge branch 'main' into lsiyuan/fix-kv-cache-test

6b20e54

lsy323 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2025

gemini-code-assist bot reviewed Jun 9, 2025

View reviewed changes

mergify bot added v1 tpu Related to Google TPUs labels Jun 9, 2025

gemini-code-assist bot reviewed Jun 9, 2025

View reviewed changes

lsy323 mentioned this pull request Jun 9, 2025

[V1][TPU] Fix TPU kv sharing tests #19155

Closed

sarckk reviewed Jun 9, 2025

View reviewed changes

lsy323 requested review from heheda12345 and mgoin June 9, 2025 22:31

mgoin approved these changes Jun 9, 2025

View reviewed changes

mgoin merged commit 7d44c46 into vllm-project:main Jun 9, 2025
56 checks passed

lsy323 deleted the lsiyuan/fix-kv-cache-test branch June 9, 2025 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[TPU]Fix KV cache sharing tests #19371

[TPU]Fix KV cache sharing tests #19371

Uh oh!

lsy323 commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 9, 2025

Uh oh!

gemini-code-assist bot Jun 9, 2025

Uh oh!

sarckk left a comment

Uh oh!

lsy323 commented Jun 9, 2025

Uh oh!

lsy323 commented Jun 9, 2025

Uh oh!

lsy323 commented Jun 9, 2025 •

edited

Loading

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

[TPU]Fix KV cache sharing tests #19371

[TPU]Fix KV cache sharing tests #19371

Uh oh!

Conversation

lsy323 commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

sarckk left a comment

Choose a reason for hiding this comment

Uh oh!

lsy323 commented Jun 9, 2025

Uh oh!

lsy323 commented Jun 9, 2025

Uh oh!

lsy323 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lsy323 commented Jun 9, 2025 •

edited

Loading