[Hybrid] A simpler algorithm to find kernel_block_size #26476

heheda12345 · 2025-10-09T08:52:56Z

Purpose

Follow-up of #24486

The kernel_block_size selection can be implemented with two steps:

 # Case 1: if the block_size of kv cache manager is supported by all backends,
# return it directly
if block_size_is_supported(backends, kv_manager_block_size):
    return kv_manager_block_size

# Case 2: otherwise, the block_size must be a `int`-format supported size of
# at lease one backend. Iterate over all `int`-format supported sizes in
# descending order and return the first one that is supported by all backends.
all_int_supported_sizes = set(
        supported_size for backend in backends
        for supported_size in backend.get_supported_kernel_block_size()
        if isinstance(supported_size, int)
)

for supported_size in sorted(all_int_supported_sizes, reverse=True):
        if block_size_is_supported(backends, supported_size):
            return supported_size

Also did some cleanup based on the review comments of that PR.

Test Plan

I don't know how to test it. @zhiyuan1i can you help me?

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

gemini-code-assist

Code Review

This pull request simplifies the algorithm for selecting the kernel_block_size. The new logic is more straightforward and appears correct. It first attempts to use the kv_manager_block_size if supported by all backends. If not, it falls back to finding the largest common integer-based supported size. The refactoring also involves removing the now-unused _find_compatible_block_sizes method and passing the determined kernel_block_sizes to relevant functions. My main feedback is to improve the error message when no common block size can be found, to aid in debugging.

gemini-code-assist · 2025-10-09T08:54:32Z

vllm/v1/worker/gpu_model_runner.py

+        for supported_size in sorted(all_int_supported_sizes, reverse=True):
+            if block_size_is_supported(backends, supported_size):
+                return supported_size
+        raise ValueError(f"No common block size for {kv_manager_block_size}. ")


The new implementation raises a less informative error message compared to the previous version when no common block size is found. The old error message listed the supported block sizes for each backend, which is very helpful for debugging. It would be great to restore that level of detail in the error message.

Suggested change

raise ValueError(f"No common block size for {kv_manager_block_size}. ")

error_msg = f"No common block size for {kv_manager_block_size}. "

for backend in backends:

supported_sizes = backend.get_supported_kernel_block_size()

error_msg += f"Backend {backend.__name__} supports: {supported_sizes}. "

raise ValueError(error_msg)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-09T09:00:40Z

vllm/v1/worker/gpu_model_runner.py

        for group in self._kv_cache_spec_attn_group_iterator():
            kv_cache_spec = group.kv_cache_spec
            attn_backend = group.backend
+            kernel_block_size = kernel_block_sizes[group.kv_cache_group_id]
            for layer_name in group.layer_names:


Skip encoder-only groups before indexing kernel_block_sizes

Encoder-only cache groups are appended to kv_cache_config.kv_cache_groups but _prepare_kernel_block_sizes intentionally omits them when building kernel_block_sizes. _kv_cache_spec_attn_group_iterator() still yields AttentionGroups for those encoder-only layers, so the loop dereferences kernel_block_sizes[group.kv_cache_group_id] before the subsequent runner_only_attn_layers check. When an encoder-only group exists (e.g., encoder–decoder models), this index is past the end of the list and initialize_kv_cache will crash. Skip encoder-only specs before indexing or include placeholders in kernel_block_sizes so that the list length matches the group ids.

Useful? React with 👍 / 👎.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

zhiyuan1i · 2025-10-12T17:16:48Z

Thanks for pointing this out, I'll take the time to test it as soon as possible

save

ca6ec4c

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

heheda12345 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners October 9, 2025 08:52

mergify bot added the v1 label Oct 9, 2025

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

heheda12345 changed the title ~~[Mamba] A simpler algorithm to find kernel_block_size~~ [Hybrid] A simpler algorithm to find kernel_block_size Oct 9, 2025

chatgpt-codex-connector bot reviewed Oct 9, 2025

View reviewed changes

fix for encoder

3eaf50a

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Hybrid] A simpler algorithm to find kernel_block_size #26476

[Hybrid] A simpler algorithm to find kernel_block_size #26476

heheda12345 commented Oct 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Uh oh!

zhiyuan1i commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        raise ValueError(f"No common block size for {kv_manager_block_size}. ")
+        error_msg = f"No common block size for {kv_manager_block_size}. "
+        for backend in backends:
+            supported_sizes = backend.get_supported_kernel_block_size()
+            error_msg += f"Backend {backend.__name__} supports: {supported_sizes}. "
+        raise ValueError(error_msg)

Uh oh!

Uh oh!

[Hybrid] A simpler algorithm to find kernel_block_size #26476

Are you sure you want to change the base?

[Hybrid] A simpler algorithm to find kernel_block_size #26476

Conversation

heheda12345 commented Oct 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

zhiyuan1i commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heheda12345 commented Oct 9, 2025 •

edited by github-actions bot

Loading