[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404
[Fix] Fix chat templating in Mini-SWE-Agent and Terminal-Bench examples#404SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a chat templating issue in the Mini-SWE-Agent and Terminal-Bench examples by introducing a new utility function, encode_messages_subset. This function correctly handles multi-turn conversation tokenization using the "fixed base approach", which is crucial for models with complex chat templates. The changes are well-implemented, and the inclusion of comprehensive unit tests for the new utility is commendable. I have one suggestion to improve the robustness of the new function. Overall, this is a solid contribution.
CharlieFRuan
left a comment
There was a problem hiding this comment.
Thank you for the fix!
| msg_encoding = encode_messages_subset([message], self.tokenizer) | ||
|
|
||
| # Extend response_ids with the tokens | ||
| response_ids.extend(msg_encoding) |
There was a problem hiding this comment.
Can we add an assertion of
initial_input_ids + respons_ids == self.tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=True)
And same for terminal bench?
At least a warning perhaps
There was a problem hiding this comment.
Hmm such an assertion or warning can be misleading or incorrect because applying the chat template message by message can be pretty different from full conversation.
For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).
But then with the RHS - the think tokens for previous messages are removed.
Now, I don't think either is the correct behaviour we want for on policy trainig, but in any case we shouldn't have this assertion.
There was a problem hiding this comment.
As such, for Qwen3 8B, I re-ran the mini swe agent example and it is okay - actually the previous expression was also correct, because for qwen 3 8B there is no default system prompt added;
print(self.tokenizer.apply_chat_template([{"role": "assistant", "content": "What is 1+1?"}], tokenize=False))
# '<|im_start|>assistant\nWhat is 1+1?<|im_end|>\n'There was a problem hiding this comment.
I believe the tests should be sufficient
There was a problem hiding this comment.
For models like Qwen 3 - the thinking tokens for previous messages in the history are discarded by default. Now, if we call encode_messages_subset on each message , we end up preserving hte think tokens for each message (even with base convo present it is okay).
Good point... so the current behavior becomes, during inference we discard thinking tokens, and for training, we keep all thinking tokens.
Made an issue for this: #410
CharlieFRuan
left a comment
There was a problem hiding this comment.
LGTM! only one nit left, after which feel free to merge
…es (NovaSky-AI#404) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…es (NovaSky-AI#404) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…es (NovaSky-AI#404) (#9) Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
…es (NovaSky-AI#404) # What does this PR do? Fixes chat templating in the Mini-swe-agent and the terminal bench examples. Previously, we were naively callling `.apply_chat_template` to encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan ) For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn We use the fixed base approach similar to what we do in the SkyRLGymGenerator. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
What does this PR do?
Fixes chat templating in the Mini-swe-agent and the terminal bench examples.
Previously, we were naively callling
.apply_chat_templateto encode response messages turn by turn - but this can append system messages for each turn depending on the model. (h/t to @CharlieFRuan )For example, Qwen 3 8B , the templating works fine, but then for Qwen 2.5 1.5B instruct, the code adds a system prompt message while tokenizing every turn
We use the fixed base approach similar to what we do in the SkyRLGymGenerator.