-
-
Couldn't load subscription status.
- Fork 10.8k
[Hybrid] A simpler algorithm to find kernel_block_size #26476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request simplifies the algorithm for selecting the kernel_block_size. The new logic is more straightforward and appears correct. It first attempts to use the kv_manager_block_size if supported by all backends. If not, it falls back to finding the largest common integer-based supported size. The refactoring also involves removing the now-unused _find_compatible_block_sizes method and passing the determined kernel_block_sizes to relevant functions. My main feedback is to improve the error message when no common block size can be found, to aid in debugging.
| for supported_size in sorted(all_int_supported_sizes, reverse=True): | ||
| if block_size_is_supported(backends, supported_size): | ||
| return supported_size | ||
| raise ValueError(f"No common block size for {kv_manager_block_size}. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new implementation raises a less informative error message compared to the previous version when no common block size is found. The old error message listed the supported block sizes for each backend, which is very helpful for debugging. It would be great to restore that level of detail in the error message.
| raise ValueError(f"No common block size for {kv_manager_block_size}. ") | |
| error_msg = f"No common block size for {kv_manager_block_size}. " | |
| for backend in backends: | |
| supported_sizes = backend.get_supported_kernel_block_size() | |
| error_msg += f"Backend {backend.__name__} supports: {supported_sizes}. " | |
| raise ValueError(error_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| for group in self._kv_cache_spec_attn_group_iterator(): | ||
| kv_cache_spec = group.kv_cache_spec | ||
| attn_backend = group.backend | ||
| kernel_block_size = kernel_block_sizes[group.kv_cache_group_id] | ||
| for layer_name in group.layer_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip encoder-only groups before indexing kernel_block_sizes
Encoder-only cache groups are appended to kv_cache_config.kv_cache_groups but _prepare_kernel_block_sizes intentionally omits them when building kernel_block_sizes. _kv_cache_spec_attn_group_iterator() still yields AttentionGroups for those encoder-only layers, so the loop dereferences kernel_block_sizes[group.kv_cache_group_id] before the subsequent runner_only_attn_layers check. When an encoder-only group exists (e.g., encoder–decoder models), this index is past the end of the list and initialize_kv_cache will crash. Skip encoder-only specs before indexing or include placeholders in kernel_block_sizes so that the list length matches the group ids.
Useful? React with 👍 / 👎.
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
Thanks for pointing this out, I'll take the time to test it as soon as possible |
Purpose
Follow-up of #24486
The kernel_block_size selection can be implemented with two steps:
Also did some cleanup based on the review comments of that PR.
Test Plan
I don't know how to test it. @zhiyuan1i can you help me?
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.