Skip to content

Conversation

@pcmoritz
Copy link
Collaborator

In #927, we left aligned the prompts for sampling prefill in preparation for #879. Here we shift them after the prefill so they will again be right aligned for the decoding. This will allow us to use the cudnn attention #927 for both prefill and decoding.

@pcmoritz pcmoritz added the tx label Jan 24, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to right-align prompts for the decoding phase after a left-aligned prefill. This is a good change that should allow for using the same efficient attention mechanism for both prefill and decoding. The approach of removing the compute_positions function and introducing a batch_roll function to shift the KV cache and attention mask is sound. However, I've identified a critical issue in the implementation of the new batch_roll function that would lead to incorrect behavior for multi-dimensional tensors like the KV cache. My review includes a suggested fix for this issue.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@pcmoritz
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a change to right-align prompts for the decoding phase, following a prefill phase with left-aligned prompts. This is a clever approach to enable the use of optimized cuDNN attention kernels for both stages. The changes are well-contained and logical. The new batch_roll utility is correctly implemented for shifting the KV cache and attention mask. The modifications in the model files to simplify position ID generation for the prefill stage are also appropriate given the new alignment strategy. Overall, the implementation appears correct and robust.

@pcmoritz pcmoritz merged commit b9a6307 into NovaSky-AI:main Jan 24, 2026
4 of 5 checks passed
tanmaysachan pushed a commit to tanmaysachan/SkyRL that referenced this pull request Jan 25, 2026
In NovaSky-AI#927, we left aligned the
prompts for sampling prefill in preparation for
NovaSky-AI#879. Here we shift them after
the prefill so they will again be right aligned for the decoding. This
will allow us to use the cudnn attention
NovaSky-AI#927 for both prefill and
decoding.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant