Skip to content

[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training #410

@CharlieFRuan

Description

@CharlieFRuan

Custom generator TI/TO

Currently, SkyRLGymGenerator obeys token-in-token-out (TI/TO) unless the user specifically specifies a custom chat template (#178), or specifically specifies re-tokenize after this PR (#351): https://skyrl.readthedocs.io/en/latest/tutorials/skyrl_gym_generator.html#multi-turn-tokenization-and-ti-to

However, this is not the case for custom generator examples:

for message in response_messages:
# Apply chat template and tokenize each message
msg_encoding = self.tokenizer.apply_chat_template([message], add_generation_prompt=False, tokenize=True)

for message in response_messages:
# Apply chat template and tokenize each message
msg_encoding = self.tokenizer.apply_chat_template([message], add_generation_prompt=False, tokenize=True)

For verifiers integration, it depends on verifiers' support for TI/TO.

An agent harness will have to generate with /completions endpoint rather than /chat/completions in order to ensure TI/TO.

Custom Generator Qwen3 chat template (either use attention mask, or always keep thinking tokens)

In addition, for these custom generator examples, if using Qwen3 models, we currently do rollout with the inference chat template (strip thinking tokens), but train with all the thinking tokens, as discussed here: #404 (comment)

We should either fix this with a custom attention mask, or always keep thinking tokens for inference and training.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions