Skip to content

Conversation

@pcmoritz
Copy link
Collaborator

@pcmoritz pcmoritz commented Jan 24, 2026

This PR writes the new decoded token into the kv cache in such a way that the whole sequence is left aligned. This is needed so that the CUDNN attention #879 truly works without attention mask.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Key-Value cache handling to support per-sequence cache positions, which is a crucial change for enabling efficient left-aligned batch decoding. The changes are consistently applied across the Llama3 and Qwen3 model implementations, as well as the generator utilities. The core logic of using per-sequence positions for updating the KV cache and attention mask seems correct. I've found one minor issue with a duplicated line of code that should be removed. Otherwise, the changes look solid.

Comment on lines 161 to 162
# Pad KV cache and attention mask to max_length
kv_cache = kv_cache.pad_to_length(max_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment and the following line are redundant. The kv_cache is already padded to max_length on line 159. These lines can be removed.

@pcmoritz pcmoritz changed the title [tx] WIP [tx] Left align generated tokens in decoding Jan 24, 2026
@pcmoritz pcmoritz merged commit 5177d69 into NovaSky-AI:main Jan 24, 2026
4 checks passed
tanmaysachan pushed a commit to tanmaysachan/SkyRL that referenced this pull request Jan 25, 2026
This PR writes the new decoded token into the kv cache in such a way
that the whole sequence is left aligned. This is needed so that the
CUDNN attention NovaSky-AI#879 truly works
without attention mask.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant