Skip to content

[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404

Merged
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:fix-templating
Oct 6, 2025
Merged

[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:fix-templating

Conversation

@SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Oct 6, 2025

What does this PR do?

Fixes chat templating in the Mini-swe-agent and the terminal bench examples.

Previously, we were naively callling .apply_chat_template to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn

We use the fixed base approach similar to what we do in the SkyRLGymGenerator.

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH marked this pull request as ready for review October 6, 2025 18:12
@SumanthRH
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a chat templating issue in the Mini-SWE-Agent and Terminal-Bench examples by introducing a new utility function, encode_messages_subset. This function correctly handles multi-turn conversation tokenization using the "fixed base approach", which is crucial for models with complex chat templates. The changes are well-implemented, and the inclusion of comprehensive unit tests for the new utility is commendable. I have one suggestion to improve the robustness of the new function. Overall, this is a solid contribution.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Collaborator

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

msg_encoding = encode_messages_subset([message], self.tokenizer)

# Extend response_ids with the tokens
response_ids.extend(msg_encoding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assertion of

initial_input_ids + respons_ids == self.tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=True)

And same for terminal bench?

At least a warning perhaps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm such an assertion or warning can be misleading or incorrect because applying the chat template message by message can be pretty different from full conversation.

For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).

But then with the RHS - the think tokens for previous messages are removed.

Now, I don't think either is the correct behaviour we want for on policy trainig, but in any case we shouldn't have this assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As such, for Qwen3 8B, I re-ran the mini swe agent example and it is okay - actually the previous expression was also correct, because for qwen 3 8B there is no default system prompt added;

print(self.tokenizer.apply_chat_template([{"role": "assistant", "content": "What is 1+1?"}], tokenize=False))
# '<|im_start|>assistant\nWhat is 1+1?<|im_end|>\n'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the tests should be sufficient

Copy link
Collaborator

@CharlieFRuan CharlieFRuan Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).

Good point... so the current behavior becomes, during inference we discard thinking tokens, and for training, we keep all thinking tokens.

Made an issue for this: #410

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Collaborator

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! only one nit left, after which feel free to merge

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH merged commit 5c98a6a into NovaSky-AI:main Oct 6, 2025
3 checks passed
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
…es (NovaSky-AI#404)

Fixes chat templating in the Mini-swe-agent and the terminal bench
examples.

Previously, we were naively callling `.apply_chat_template` to encode
response messages turn by turn - but this can append system messages for
each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen
2.5 1.5B instruct, the code adds a system prompt message while
tokenizing every turn

We use the fixed base approach similar to what we do in the
SkyRLGymGenerator.

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
CharlieFRuan pushed a commit to mlfoundations/SkyRL that referenced this pull request Nov 26, 2025
…es (NovaSky-AI#404)

Fixes chat templating in the Mini-swe-agent and the terminal bench
examples.

Previously, we were naively callling `.apply_chat_template` to encode
response messages turn by turn - but this can append system messages for
each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen
2.5 1.5B instruct, the code adds a system prompt message while
tokenizing every turn

We use the fixed base approach similar to what we do in the
SkyRLGymGenerator.

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
CharlieFRuan added a commit to mlfoundations/SkyRL that referenced this pull request Nov 26, 2025
…es (NovaSky-AI#404) (#9)

Fixes chat templating in the Mini-swe-agent and the terminal bench
examples.

Previously, we were naively callling `.apply_chat_template` to encode
response messages turn by turn - but this can append system messages for
each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen
2.5 1.5B instruct, the code adds a system prompt message while
tokenizing every turn

We use the fixed base approach similar to what we do in the
SkyRLGymGenerator.

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
…es (NovaSky-AI#404)

# What does this PR do?

Fixes chat templating in the Mini-swe-agent and the terminal bench
examples.


Previously, we were naively callling `.apply_chat_template` to encode
response messages turn by turn - but this can append system messages for
each turn depending on the model. (h/t to @CharlieFRuan )

For example, Qwen 3 8B , the templating works fine, but then for Qwen
2.5 1.5B instruct, the code adds a system prompt message while
tokenizing every turn


We use the fixed base approach similar to what we do in the
SkyRLGymGenerator.

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments