Skip to content

Conversation

@heheda12345
Copy link
Collaborator

@heheda12345 heheda12345 commented Oct 9, 2025

Purpose

Follow-up of #24486

The kernel_block_size selection can be implemented with two steps:

 # Case 1: if the block_size of kv cache manager is supported by all backends,
# return it directly
if block_size_is_supported(backends, kv_manager_block_size):
    return kv_manager_block_size

# Case 2: otherwise, the block_size must be a `int`-format supported size of
# at lease one backend. Iterate over all `int`-format supported sizes in
# descending order and return the first one that is supported by all backends.
all_int_supported_sizes = set(
        supported_size for backend in backends
        for supported_size in backend.get_supported_kernel_block_size()
        if isinstance(supported_size, int)
)

for supported_size in sorted(all_int_supported_sizes, reverse=True):
        if block_size_is_supported(backends, supported_size):
            return supported_size

Also did some cleanup based on the review comments of that PR.

Test Plan

I don't know how to test it. @zhiyuan1i can you help me?

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the algorithm for selecting the kernel_block_size. The new logic is more straightforward and appears correct. It first attempts to use the kv_manager_block_size if supported by all backends. If not, it falls back to finding the largest common integer-based supported size. The refactoring also involves removing the now-unused _find_compatible_block_sizes method and passing the determined kernel_block_sizes to relevant functions. My main feedback is to improve the error message when no common block size can be found, to aid in debugging.

for supported_size in sorted(all_int_supported_sizes, reverse=True):
if block_size_is_supported(backends, supported_size):
return supported_size
raise ValueError(f"No common block size for {kv_manager_block_size}. ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new implementation raises a less informative error message compared to the previous version when no common block size is found. The old error message listed the supported block sizes for each backend, which is very helpful for debugging. It would be great to restore that level of detail in the error message.

Suggested change
raise ValueError(f"No common block size for {kv_manager_block_size}. ")
error_msg = f"No common block size for {kv_manager_block_size}. "
for backend in backends:
supported_sizes = backend.get_supported_kernel_block_size()
error_msg += f"Backend {backend.__name__} supports: {supported_sizes}. "
raise ValueError(error_msg)

@heheda12345 heheda12345 changed the title [Mamba] A simpler algorithm to find kernel_block_size [Hybrid] A simpler algorithm to find kernel_block_size Oct 9, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 4271 to 4275
for group in self._kv_cache_spec_attn_group_iterator():
kv_cache_spec = group.kv_cache_spec
attn_backend = group.backend
kernel_block_size = kernel_block_sizes[group.kv_cache_group_id]
for layer_name in group.layer_names:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip encoder-only groups before indexing kernel_block_sizes

Encoder-only cache groups are appended to kv_cache_config.kv_cache_groups but _prepare_kernel_block_sizes intentionally omits them when building kernel_block_sizes. _kv_cache_spec_attn_group_iterator() still yields AttentionGroups for those encoder-only layers, so the loop dereferences kernel_block_sizes[group.kv_cache_group_id] before the subsequent runner_only_attn_layers check. When an encoder-only group exists (e.g., encoder–decoder models), this index is past the end of the list and initialize_kv_cache will crash. Skip encoder-only specs before indexing or include placeholders in kernel_block_sizes so that the list length matches the group ids.

Useful? React with 👍 / 👎.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@zhiyuan1i
Copy link
Contributor

Thanks for pointing this out, I'll take the time to test it as soon as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants