-
Notifications
You must be signed in to change notification settings - Fork 254
[tx] Right align prompts for decoding #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to right-align prompts for the decoding phase after a left-aligned prefill. This is a good change that should allow for using the same efficient attention mechanism for both prefill and decoding. The approach of removing the compute_positions function and introducing a batch_roll function to shift the KV cache and attention mask is sound. However, I've identified a critical issue in the implementation of the new batch_roll function that would lead to incorrect behavior for multi-dimensional tensors like the KV cache. My review includes a suggested fix for this issue.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a change to right-align prompts for the decoding phase, following a prefill phase with left-aligned prompts. This is a clever approach to enable the use of optimized cuDNN attention kernels for both stages. The changes are well-contained and logical. The new batch_roll utility is correctly implemented for shifting the KV cache and attention mask. The modifications in the model files to simplify position ID generation for the prefill stage are also appropriate given the new alignment strategy. Overall, the implementation appears correct and robust.
…o tx-right-align-decode
In NovaSky-AI#927, we left aligned the prompts for sampling prefill in preparation for NovaSky-AI#879. Here we shift them after the prefill so they will again be right aligned for the decoding. This will allow us to use the cudnn attention NovaSky-AI#927 for both prefill and decoding. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
In #927, we left aligned the prompts for sampling prefill in preparation for #879. Here we shift them after the prefill so they will again be right aligned for the decoding. This will allow us to use the cudnn attention #927 for both prefill and decoding.