[tx] Left align generated tokens in decoding #933

pcmoritz · 2026-01-24T10:44:55Z

This PR writes the new decoded token into the kv cache in such a way that the whole sequence is left aligned. This is needed so that the CUDNN attention #879 truly works without attention mask.

gemini-code-assist

Code Review

This pull request refactors the Key-Value cache handling to support per-sequence cache positions, which is a crucial change for enabling efficient left-aligned batch decoding. The changes are consistently applied across the Llama3 and Qwen3 model implementations, as well as the generator utilities. The core logic of using per-sequence positions for updating the KV cache and attention mask seems correct. I've found one minor issue with a duplicated line of code that should be removed. Otherwise, the changes look solid.

gemini-code-assist · 2026-01-24T10:47:08Z

skyrl-tx/tx/utils/generator.py

        # Pad KV cache and attention mask to max_length
        kv_cache = kv_cache.pad_to_length(max_length)


This comment and the following line are redundant. The kv_cache is already padded to max_length on line 159. These lines can be removed.

This PR writes the new decoded token into the kv cache in such a way that the whole sequence is left aligned. This is needed so that the CUDNN attention NovaSky-AI#879 truly works without attention mask.

wip

710c169

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

pcmoritz added 6 commits January 24, 2026 10:00

black

d5d6d34

dedup

8175024

update

30ce717

update

16396a0

update

21ca26e

simplify

c01bb4b

pcmoritz changed the title ~~[tx] WIP~~ [tx] Left align generated tokens in decoding Jan 24, 2026

pcmoritz merged commit 5177d69 into NovaSky-AI:main Jan 24, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tx] Left align generated tokens in decoding #933

[tx] Left align generated tokens in decoding #933

pcmoritz commented Jan 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Pad KV cache and attention mask to max_length
		kv_cache = kv_cache.pad_to_length(max_length)

[tx] Left align generated tokens in decoding #933

[tx] Left align generated tokens in decoding #933

Conversation

pcmoritz commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pcmoritz commented Jan 24, 2026 •

edited

Loading