-
Notifications
You must be signed in to change notification settings - Fork 259
Description
Custom generator TI/TO
Currently, SkyRLGymGenerator obeys token-in-token-out (TI/TO) unless the user specifically specifies a custom chat template (#178), or specifically specifies re-tokenize after this PR (#351): https://skyrl.readthedocs.io/en/latest/tutorials/skyrl_gym_generator.html#multi-turn-tokenization-and-ti-to
However, this is not the case for custom generator examples:
SkyRL/skyrl-train/examples/mini_swe_agent/mini_swe_generator.py
Lines 183 to 185 in 0c53bd7
| for message in response_messages: | |
| # Apply chat template and tokenize each message | |
| msg_encoding = self.tokenizer.apply_chat_template([message], add_generation_prompt=False, tokenize=True) |
SkyRL/skyrl-train/examples/terminal_bench/generator/terminal_bench_generator.py
Lines 135 to 137 in 0c53bd7
| for message in response_messages: | |
| # Apply chat template and tokenize each message | |
| msg_encoding = self.tokenizer.apply_chat_template([message], add_generation_prompt=False, tokenize=True) |
For verifiers integration, it depends on verifiers' support for TI/TO.
An agent harness will have to generate with /completions endpoint rather than /chat/completions in order to ensure TI/TO.
Custom Generator Qwen3 chat template (either use attention mask, or always keep thinking tokens)
In addition, for these custom generator examples, if using Qwen3 models, we currently do rollout with the inference chat template (strip thinking tokens), but train with all the thinking tokens, as discussed here: #404 (comment)
We should either fix this with a custom attention mask, or always keep thinking tokens for inference and training.