[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples by SumanthRH · Pull Request #404 · NovaSky-AI/SkyRL

SumanthRH · 2025-10-06T18:11:40Z

What does this PR do?

Fixes chat templating in the Mini-swe-agent and the terminal bench examples.

Previously, we were naively callling .apply_chat_template to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn

We use the fixed base approach similar to what we do in the SkyRLGymGenerator.

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2025-10-06T18:12:20Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses a chat templating issue in the Mini-SWE-Agent and Terminal-Bench examples by introducing a new utility function, encode_messages_subset. This function correctly handles multi-turn conversation tokenization using the "fixed base approach", which is crucial for models with complex chat templates. The changes are well-implemented, and the inclusion of comprehensive unit tests for the new utility is commendable. I have one suggestion to improve the robustness of the new function. Overall, this is a solid contribution.

skyrl-train/skyrl_train/generators/utils.py

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

CharlieFRuan

Thank you for the fix!

skyrl-train/skyrl_train/generators/utils.py

CharlieFRuan · 2025-10-06T18:49:36Z

skyrl-train/examples/mini_swe_agent/mini_swe_generator.py

+            msg_encoding = encode_messages_subset([message], self.tokenizer)

            # Extend response_ids with the tokens
            response_ids.extend(msg_encoding)


Can we add an assertion of

initial_input_ids + respons_ids == self.tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=True)

And same for terminal bench?

At least a warning perhaps

Hmm such an assertion or warning can be misleading or incorrect because applying the chat template message by message can be pretty different from full conversation.

For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).

But then with the RHS - the think tokens for previous messages are removed.

Now, I don't think either is the correct behaviour we want for on policy trainig, but in any case we shouldn't have this assertion.

As such, for Qwen3 8B, I re-ran the mini swe agent example and it is okay - actually the previous expression was also correct, because for qwen 3 8B there is no default system prompt added;

print(self.tokenizer.apply_chat_template([{"role": "assistant", "content": "What is 1+1?"}], tokenize=False)) # '<|im_start|>assistant\nWhat is 1+1?<|im_end|>\n'

I believe the tests should be sufficient

For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).

Good point... so the current behavior becomes, during inference we discard thinking tokens, and for training, we keep all thinking tokens.

Made an issue for this: #410

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

CharlieFRuan

LGTM! only one nit left, after which feel free to merge

skyrl-train/examples/mini_swe_agent/mini_swe_generator.py

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

@CharlieFRuan

…es (NovaSky-AI#404) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

@CharlieFRuan

…es (NovaSky-AI#404) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

@CharlieFRuan

…es (NovaSky-AI#404) (#9) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

@CharlieFRuan

…es (NovaSky-AI#404) # What does this PR do? Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added 3 commits October 6, 2025 17:55

fix templating

974f596

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

4503a05

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

15f22ff

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review October 6, 2025 18:12

SumanthRH assigned CharlieFRuan Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

skyrl-train/skyrl_train/generators/utils.py Show resolved Hide resolved

SumanthRH added 2 commits October 6, 2025 18:14

x

8e7169e

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

67cdadb

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

CharlieFRuan reviewed Oct 6, 2025

View reviewed changes

x

fe8ed42

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH requested a review from CharlieFRuan October 6, 2025 21:44

CharlieFRuan mentioned this pull request Oct 6, 2025

[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training #410

Open

CharlieFRuan approved these changes Oct 6, 2025

View reviewed changes

skyrl-train/examples/mini_swe_agent/mini_swe_generator.py Show resolved Hide resolved

SumanthRH added 2 commits October 6, 2025 22:07

x

58adb38

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

9c46264

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH merged commit 5c98a6a into NovaSky-AI:main Oct 6, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404

[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:fix-templating

SumanthRH commented Oct 6, 2025 •

edited

Loading

Uh oh!

SumanthRH commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

CharlieFRuan Oct 6, 2025

Uh oh!

SumanthRH Oct 6, 2025

Uh oh!

SumanthRH Oct 6, 2025

Uh oh!

SumanthRH Oct 6, 2025

Uh oh!

CharlieFRuan Oct 6, 2025 •

edited

Loading

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

SumanthRH commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SumanthRH commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CharlieFRuan Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

SumanthRH commented Oct 6, 2025 •

edited

Loading

CharlieFRuan Oct 6, 2025 •

edited

Loading