Skip to content

Conversation

@youzhedian
Copy link
Contributor

@youzhedian youzhedian commented Aug 28, 2025

Pre-PR for #1367

Suggestions from @youkaichao : to accelerate the review and merge (especially ci testing), maybe we can split the kernel side changes to a separate PR and get it merged first.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new CUDA kernels, cp_fused_concat_and_cache_mla and cp_gather_cache, to support an upcoming context parallel feature. The changes include the kernel implementations, their PyTorch bindings, and corresponding Python wrappers. Overall, the kernel implementations appear correct and follow existing patterns in the codebase. My main feedback is on the testing coverage. The new cp_fused_concat_and_cache_mla kernel is missing a unit test, and the test for cp_gather_cache is incomplete as it doesn't cover the key functionality it introduces. Adding comprehensive tests is crucial for ensuring the correctness and maintainability of these new kernels.

Comment on lines +804 to +806
def test_cp_gather_cache_mla(kv_lora_rank, qk_rope_head_dim, block_size,
num_blocks, max_seq_len, batch_size, dtype,
kv_cache_dtype, device):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test for cp_gather_cache is incomplete. The main purpose of this new kernel is to support arbitrary seq_starts, but the test only covers the case where seq_starts is None. Please add test cases that use non-zero seq_starts to validate this new functionality.

Additionally, the test only uses batch_size=8. The kernel has different logic for num_splits based on batch_size. It would be beneficial to test with a wider range of batch sizes to cover all branches, for example [8, 70, 130].

)

ops.cp_gather_cache(src_cache, dst, block_table, cu_seq_lens, batch_size)
torch.testing.assert_close(dst, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new CUDA kernel cp_fused_concat_and_cache_mla is missing a unit test. Adding a test is crucial to ensure its correctness and prevent regressions. Please add a test case for this new functionality, similar to test_concat_and_cache_mla.

@youzhedian youzhedian changed the title [Kernel] cuda kernels for upcoming context parallel feature [Kernel] cuda kernels for upcoming decode context parallel feature Aug 28, 2025
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since this only adds two new kernels. cc @WoosukKwon @LucasWilkinson if you have more comments.

@youkaichao
Copy link
Member

kernel tests passed, failed tests are unrelated. merging.

@youkaichao youkaichao merged commit 186aced into vllm-project:main Aug 28, 2025
15 of 18 checks passed
@gshtras
Copy link
Collaborator

gshtras commented Aug 28, 2025

kernel tests passed, failed tests are unrelated. merging.

Not really. AMD build failure is not unrelated

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
@draftbk
Copy link
Contributor

draftbk commented Aug 28, 2025

I am encountering an issue that is very likely caused by this PR when building on AMD MI300X.
Shell Script

# Run
python setup.py clean && python setup.py build
# Error
FAILED: [code=1] CMakeFiles/_C.dir/csrc/cache_kernels.hip.o

/data/users/lifans/gitrepos/vllm/build/temp.linux-x86_64-cpython-312/csrc/cache_kernels.hip:918:14: error: unused variable 'is_last_split' [-Werror,-Wunused-variable]
  918 |   const bool is_last_split = (split_end == tot_slots);
      |              ^~~~~~~~~~~~~
2 warnings and 1 error generated when compiling for gfx942.

If I switch back to commit c07a733 (2 commits ahead), the error disappears.

vllm-bot pushed a commit that referenced this pull request Aug 29, 2025
Signed-off-by: charlifu <charlifu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants