[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training

### Custom generator TI/TO

Currently, SkyRLGymGenerator obeys token-in-token-out (TI/TO) unless the user specifically specifies a custom chat template (https://github.com/NovaSky-AI/SkyRL/pull/178), or specifically specifies re-tokenize after this PR (https://github.com/NovaSky-AI/SkyRL/pull/351): https://skyrl.readthedocs.io/en/latest/tutorials/skyrl_gym_generator.html#multi-turn-tokenization-and-ti-to

However, this is not the case for custom generator examples:

https://github.com/NovaSky-AI/SkyRL/blob/0c53bd777eebad4f83084b352b39fdf29d3db499/skyrl-train/examples/mini_swe_agent/mini_swe_generator.py#L183-L185

https://github.com/NovaSky-AI/SkyRL/blob/0c53bd777eebad4f83084b352b39fdf29d3db499/skyrl-train/examples/terminal_bench/generator/terminal_bench_generator.py#L135-L137

For verifiers integration, it depends on verifiers' support for TI/TO.

An agent harness will have to generate with `/completions` endpoint rather than `/chat/completions` in order to ensure TI/TO.


### Custom Generator Qwen3 chat template (either use attention mask, or always keep thinking tokens)

In addition, for these custom generator examples, if using Qwen3 models, we currently do rollout with the inference chat template (strip thinking tokens), but train with all the thinking tokens, as discussed here: https://github.com/NovaSky-AI/SkyRL/pull/404#discussion_r2407965098

We should either fix this with a custom attention mask, or always keep thinking tokens for inference and training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training #410

Custom generator TI/TO

Custom Generator Qwen3 chat template (either use attention mask, or always keep thinking tokens)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	for message in response_messages:
	# Apply chat template and tokenize each message
	msg_encoding = self.tokenizer.apply_chat_template([message], add_generation_prompt=False, tokenize=True)

[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training #410

Description

Custom generator TI/TO

Custom Generator Qwen3 chat template (either use attention mask, or always keep thinking tokens)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions