[SkyRLGymGenerator] Add genreator.chat_template and change Qwen3 default behavior to accumulate thinking tokens by devpatelio · Pull Request #178 · NovaSky-AI/SkyRL

devpatelio · 2025-08-21T18:09:50Z

Prior to this PR, for SkyRLGymGenerator we do:

def get_custom_chat_template(model_name: str) -> str:
    if "Qwen3" in model_name:
        return CUSTOM_CHAT_TEMPLATES["qwen3_thinking"]
    else:
        return None

That is, we always use a chat template that strips the thinking tokens for Qwen3 models.

This is a really implicit behavior. Users might want to keep the thinking tokens, or strip them. One prior PR that tries to address this: #117

In this PR, in ppo_base_config, add generator.chat_template to allow users to define custom chat template. This is particularly helpful for Qwen3 models, where you might want to keep the thinking tokens or remove them.

  chat_template:
    source: "name"  # "name" or "file"
    name_or_path: null  # e.g., "qwen3_with_thinking" or "/path/to/template.j2"

You can either specify a chat template defined in skyrl_train.generators.utils.CUSTOM_CHAT_TEMPLATES (we currently have qwen3_with_thinking and qwen3_without_thinking; the latter is the old one we use) or load from a file.

Before this PR, the behavior for Qwen3 models are that, they will not keep the non-last turn thinking tokens and we always retokenize (every turn and at the end of the trajectory).

After this PR, Qwen3 models by default will accumulate in a TI/TO manner, just like any other models. Users can modify this behavior by setting name_or_path to qwen3_without_thinking to keep the old behavior.

Related to #179 and address #104

tyler-griggs

Thanks for sending this in! Left a couple comments, primarily around moving chat templates out of the config. There was also a pretty big PR that was just merged that touches some similar code in skyrl_gym_generator.py so it'd be good to merge changes!

skyrl-train/skyrl_train/config/ppo_base_config.yaml

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/generators/utils.py

… commits as right now, users have to manually add in parser support). Provided thinking masking on/off support for Qwen3 Models

Done!

…. Final checks are done and ready!

…nja2 files

devpatelio · 2025-08-23T04:55:34Z

Here's the wandb for the updated run with the file and name support: https://wandb.ai/devpatelio/updated_template_system

tyler-griggs · 2025-08-24T21:42:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a flexible system for custom chat templates and masking for "thinking" tokens, which is a valuable enhancement. The changes are supported by new test scripts and a mini-dataset for validation. However, the current implementation has several critical issues, including NameError exceptions from undefined variables that will lead to runtime crashes, unreachable code due to flawed logic, and a bug in batched generation that results in incorrect loss masks. I have also identified several medium-severity issues such as hardcoded paths in scripts, duplicated template files, and leftover debug print statements. My review provides specific suggestions to address these problems to ensure the new functionality is robust and maintainable.

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/generators/utils.py

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

skyrl-train/examples/gsm8k/run_gsm8k_thinking.sh

skyrl-train/examples/gsm8k/gsm8k_mini_dataset.py

skyrl-train/examples/gsm8k/gsm8k_thinking.py

skyrl-train/skyrl_train/config/example_chat_template.j2

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

tyler-griggs · 2025-08-24T21:45:55Z

@devpatelio quick check: is this PR ready for review? I see several debugging print statements, and a couple of the files in examples/gsm8k look like they are dev-only tests. Should these all be removed?

…rebase

gemini-code-assist

Code Review

This pull request introduces a flexible system for custom chat templates and token masking, which is a significant improvement over the previous hard-coded approach. The changes are well-structured, providing new configuration options in ppo_base_config.yaml and updating the generator logic accordingly while maintaining backward compatibility. The addition of tests for the new functionality is also a positive step. My review includes a couple of suggestions to enhance maintainability by addressing file duplication and to improve robustness by refining exception handling.

skyrl-train/skyrl_train/config/example_chat_template.j2

skyrl-train/skyrl_train/generators/utils.py

devpatelio · 2025-08-24T22:13:04Z

@tyler-griggs Sorry about that, I forgot to remove some of the debug print statements + fix some of the rebasing variable names. The PR is ready to review now!

…t list instead of List[List[int]]

gemini-code-assist

Code Review

This pull request introduces a flexible system for using custom chat templates, including support for loading them from files and masking "thinking" tokens during training. The changes are well-structured, with new configurations in ppo_base_config.yaml and updated logic in the generator and utility functions. My review includes a couple of suggestions to improve code readability and maintainability, such as refactoring Jinja template definitions to be more readable and using a list comprehension for conciseness.

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/generators/utils.py

devpatelio · 2025-09-25T23:02:19Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a flexible system for using custom chat templates, including loading them from files and masking "thinking" tokens for training. The changes are well-implemented, with robust error handling for template loading and comprehensive tests that cover the new functionality, including file-based templates and loss masking. The addition of qwen3_with_thinking and qwen3_without_thinking templates, along with the configuration in ppo_base_config.yaml, effectively addresses the goal of supporting custom templates and token masking.

skyrl-train/tests/cpu/generators/qwen3_acc_thinking.jinja2

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

devpatelio · 2025-10-04T09:47:08Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a flexible mechanism for using custom chat templates, allowing them to be specified by name or loaded from a file. The changes are well-implemented, particularly the new get_custom_chat_template function in utils.py which includes robust error handling. The accompanying tests have been updated effectively to cover the new functionality, including loading templates from files and verifying their behavior. I have a couple of minor suggestions to improve type hinting and simplify a template's logic. Overall, this is a great enhancement to the project's flexibility.

gemini-code-assist · 2025-10-04T09:49:19Z

skyrl-train/skyrl_train/generators/utils.py

For better type safety and code clarity, the type hint for chat_template_config should reflect that it can be an omegaconf.DictConfig object, which is what it receives from the configuration. Using a string literal for the type hint avoids needing to import it here if it's not desired.

You can add from omegaconf import DictConfig at the top of the file to resolve the type hint.

Suggested change

def get_custom_chat_template(chat_template_config: Optional[dict] = None) -> Optional[str]:

def get_custom_chat_template(chat_template_config: Optional[Union[dict, "DictConfig"]] = None) -> Optional[str]:

@devpatelio this suggestion is actually correct. We should do from omegaconf import DictConfig at the top, and change it to Union[dict, DictConfig]. This is because generator_cfg.chat_template is a DictConfig and not a dict.

Would be good to quickly check that run_gsm8k.sh works with:

Not specifying anything (i.e. using the default chat_template values)

Specifying something (e.g. from name)

skyrl-train/tests/cpu/generators/qwen3_acc_without_thinking.jinja2

CharlieFRuan

Last nits...

CharlieFRuan · 2025-10-04T19:03:30Z

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

You can leave this unchanged in the PR. the if an else share the same two lines

CharlieFRuan · 2025-10-04T19:03:44Z

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

    expected_user_loss_mask = [0] * len(empty_user) + [0]  # extra 0 for single observation token

    if custom_chat_template is not None:
-        # For custom_chat_template, the first generation prompt IDs are part of `resp_str`, hence has corresponding mask


Keep this comment

CharlieFRuan · 2025-10-04T19:03:57Z

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

un-indent these two lines to be shared by the if and else

CharlieFRuan · 2025-10-04T19:09:13Z

skyrl-train/skyrl_train/generators/utils.py

@devpatelio this suggestion is actually correct. We should do from omegaconf import DictConfig at the top, and change it to Union[dict, DictConfig]. This is because generator_cfg.chat_template is a DictConfig and not a dict.

Would be good to quickly check that run_gsm8k.sh works with:

Not specifying anything (i.e. using the default chat_template values)

Specifying something (e.g. from name)

devpatelio · 2025-10-04T21:21:56Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a flexible system for using custom chat templates, allowing them to be specified by name from a predefined set or loaded from a file. This is a great enhancement for supporting different model behaviors, like masking 'thinking' tokens. The implementation is solid, with robust handling of configurations and file operations in utils.py. The tests have been effectively updated to cover the new functionality, including loading templates from files.

I've added a couple of minor suggestions to improve the clarity and reduce redundancy in the test files. Overall, this is a well-executed feature addition.

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py

…_templating.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…token out (#438) This is a follow-up PR to #178 In that PR, we changed the default codepath of Qwen3 in `SkyRLGymGenerator` to always keep all the thinking tokens for multi-turn generation and subsequent turn generation. Users can return to the previous behavior (strip thinking tokens for both inference and training) by specifying `config.generator.chat_template` introduced in that PR. This PR does a couple of things ### 1. Doc update on Qwen3 in SkyRLGymGenerator First we update the doc to explain the current behavior with Qwen3 in SkyRLGymGenerator. ### 2. Bug fix with fixed base method motivated by Qwen3 Then I noticed a bug when using Qwen3 with codepath 1 (TI/TO with strictly appending token sequence) due to how we have `<|im_start|>assistant\nI am an assistant.<|im_end|>\n` in our fixed-base method's `base_conversation_token_ids`. This, when templated by Qwen3's chat template, will add dummy empty thinking tokens `<think>\n\n</think>\n\n`, making the delta not what we expect. So I exclude assistant in the base conversation for both SkyRLGymGenerator, and our helper `encode_messages_subset ()` for the custom generator. As a result, we will have consecutive user messages when constructing observation token IDs, which should be fine. That is, before this PR, we use `base_conversation_token_ids`: ``` In [7]: tokenizer.decode(base_conversation_token_ids) Out[7]: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nI am a user.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nI am an assistant.<|im_end|>\n' ``` but adding any message right after it will remove the thinking tokens by default, making the indexing incorrect ### 3. Add more unit tests. In `test_skyrl_gym_generator_chat_templating_exact()`, we used to test `Qwen3` with our custom chat template that strips thinking tokens, I now also test the codepath 1's Qwen3 (accumulate thinking tokens). Did the same thing for `test_append_eos_after_stop_multi_turn()`. Then since the `_exact()` test does not check thinking token handling with Qwen3, I added `test_skyrl_gym_generator_chat_templating_qwen3_thinking` that is essentially the same test but just with Qwen3 model, and with mock llm generation to be `"<think>\nmock thinking\n</think>\n\nb"` instead of just `"b"`, making the test more realistic.

…token out (NovaSky-AI#438) This is a follow-up PR to NovaSky-AI#178 In that PR, we changed the default codepath of Qwen3 in `SkyRLGymGenerator` to always keep all the thinking tokens for multi-turn generation and subsequent turn generation. Users can return to the previous behavior (strip thinking tokens for both inference and training) by specifying `config.generator.chat_template` introduced in that PR. This PR does a couple of things ### 1. Doc update on Qwen3 in SkyRLGymGenerator First we update the doc to explain the current behavior with Qwen3 in SkyRLGymGenerator. ### 2. Bug fix with fixed base method motivated by Qwen3 Then I noticed a bug when using Qwen3 with codepath 1 (TI/TO with strictly appending token sequence) due to how we have `<|im_start|>assistant\nI am an assistant.<|im_end|>\n` in our fixed-base method's `base_conversation_token_ids`. This, when templated by Qwen3's chat template, will add dummy empty thinking tokens `<think>\n\n</think>\n\n`, making the delta not what we expect. So I exclude assistant in the base conversation for both SkyRLGymGenerator, and our helper `encode_messages_subset ()` for the custom generator. As a result, we will have consecutive user messages when constructing observation token IDs, which should be fine. That is, before this PR, we use `base_conversation_token_ids`: ``` In [7]: tokenizer.decode(base_conversation_token_ids) Out[7]: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nI am a user.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nI am an assistant.<|im_end|>\n' ``` but adding any message right after it will remove the thinking tokens by default, making the indexing incorrect ### 3. Add more unit tests. In `test_skyrl_gym_generator_chat_templating_exact()`, we used to test `Qwen3` with our custom chat template that strips thinking tokens, I now also test the codepath 1's Qwen3 (accumulate thinking tokens). Did the same thing for `test_append_eos_after_stop_multi_turn()`. Then since the `_exact()` test does not check thinking token handling with Qwen3, I added `test_skyrl_gym_generator_chat_templating_qwen3_thinking` that is essentially the same test but just with Qwen3 model, and with mock llm generation to be `"<think>\nmock thinking\n</think>\n\nb"` instead of just `"b"`, making the test more realistic.

…ult behavior on thinking tokens accumulation (NovaSky-AI#178) In `ppo_base_config`, add `generator.chat_template` to allow users to define custom chat template. This is particularly helpful for Qwen3 models, where you might want to keep the thinking tokens or remove them. ```.yaml chat_template: source: "name" # "name" or "file" name_or_path: null # e.g., "qwen3_with_thinking" or "/path/to/template.j2" ``` You can either specify a chat template defined in `skyrl_train.generators.utils.CUSTOM_CHAT_TEMPLATES` (we currently have `qwen3_with_thinking` and `qwen3_without_thinking`) or load from a file. Before this PR, the behavior for Qwen3 models are that, they will not keep the non-last turn thinking tokens and we always retokenize (every turn and at the end of the trajectory). After this PR, Qwen3 models by default will accumulate in a TI/TO manner, just like any other models. Users can modify this behavior by setting `name_or_path` to `qwen3_without_thinking` to keep the old behavior. Related to NovaSky-AI#179 and address NovaSky-AI#104 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <tgriggs@berkeley.edu> Co-authored-by: Charlie Ruan <charlieruan@berkeley.edu>

…token out (NovaSky-AI#438) This is a follow-up PR to NovaSky-AI#178 In that PR, we changed the default codepath of Qwen3 in `SkyRLGymGenerator` to always keep all the thinking tokens for multi-turn generation and subsequent turn generation. Users can return to the previous behavior (strip thinking tokens for both inference and training) by specifying `config.generator.chat_template` introduced in that PR. This PR does a couple of things ### 1. Doc update on Qwen3 in SkyRLGymGenerator First we update the doc to explain the current behavior with Qwen3 in SkyRLGymGenerator. ### 2. Bug fix with fixed base method motivated by Qwen3 Then I noticed a bug when using Qwen3 with codepath 1 (TI/TO with strictly appending token sequence) due to how we have `<|im_start|>assistant\nI am an assistant.<|im_end|>\n` in our fixed-base method's `base_conversation_token_ids`. This, when templated by Qwen3's chat template, will add dummy empty thinking tokens `<think>\n\n</think>\n\n`, making the delta not what we expect. So I exclude assistant in the base conversation for both SkyRLGymGenerator, and our helper `encode_messages_subset ()` for the custom generator. As a result, we will have consecutive user messages when constructing observation token IDs, which should be fine. That is, before this PR, we use `base_conversation_token_ids`: ``` In [7]: tokenizer.decode(base_conversation_token_ids) Out[7]: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nI am a user.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nI am an assistant.<|im_end|>\n' ``` but adding any message right after it will remove the thinking tokens by default, making the indexing incorrect ### 3. Add more unit tests. In `test_skyrl_gym_generator_chat_templating_exact()`, we used to test `Qwen3` with our custom chat template that strips thinking tokens, I now also test the codepath 1's Qwen3 (accumulate thinking tokens). Did the same thing for `test_append_eos_after_stop_multi_turn()`. Then since the `_exact()` test does not check thinking token handling with Qwen3, I added `test_skyrl_gym_generator_chat_templating_qwen3_thinking` that is essentially the same test but just with Qwen3 model, and with mock llm generation to be `"<think>\nmock thinking\n</think>\n\nb"` instead of just `"b"`, making the test more realistic.

…ult behavior on thinking tokens accumulation (#178) In `ppo_base_config`, add `generator.chat_template` to allow users to define custom chat template. This is particularly helpful for Qwen3 models, where you might want to keep the thinking tokens or remove them. ```.yaml chat_template: source: "name" # "name" or "file" name_or_path: null # e.g., "qwen3_with_thinking" or "/path/to/template.j2" ``` You can either specify a chat template defined in `skyrl_train.generators.utils.CUSTOM_CHAT_TEMPLATES` (we currently have `qwen3_with_thinking` and `qwen3_without_thinking`) or load from a file. Before this PR, the behavior for Qwen3 models are that, they will not keep the non-last turn thinking tokens and we always retokenize (every turn and at the end of the trajectory). After this PR, Qwen3 models by default will accumulate in a TI/TO manner, just like any other models. Users can modify this behavior by setting `name_or_path` to `qwen3_without_thinking` to keep the old behavior. Related to #179 and address #104 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <tgriggs@berkeley.edu> Co-authored-by: Charlie Ruan <charlieruan@berkeley.edu>

…token out (NovaSky-AI#438) This is a follow-up PR to NovaSky-AI#178 In that PR, we changed the default codepath of Qwen3 in `SkyRLGymGenerator` to always keep all the thinking tokens for multi-turn generation and subsequent turn generation. Users can return to the previous behavior (strip thinking tokens for both inference and training) by specifying `config.generator.chat_template` introduced in that PR. This PR does a couple of things ### 1. Doc update on Qwen3 in SkyRLGymGenerator First we update the doc to explain the current behavior with Qwen3 in SkyRLGymGenerator. ### 2. Bug fix with fixed base method motivated by Qwen3 Then I noticed a bug when using Qwen3 with codepath 1 (TI/TO with strictly appending token sequence) due to how we have `<|im_start|>assistant\nI am an assistant.<|im_end|>\n` in our fixed-base method's `base_conversation_token_ids`. This, when templated by Qwen3's chat template, will add dummy empty thinking tokens `<think>\n\n</think>\n\n`, making the delta not what we expect. So I exclude assistant in the base conversation for both SkyRLGymGenerator, and our helper `encode_messages_subset ()` for the custom generator. As a result, we will have consecutive user messages when constructing observation token IDs, which should be fine. That is, before this PR, we use `base_conversation_token_ids`: ``` In [7]: tokenizer.decode(base_conversation_token_ids) Out[7]: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nI am a user.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nI am an assistant.<|im_end|>\n' ``` but adding any message right after it will remove the thinking tokens by default, making the indexing incorrect ### 3. Add more unit tests. In `test_skyrl_gym_generator_chat_templating_exact()`, we used to test `Qwen3` with our custom chat template that strips thinking tokens, I now also test the codepath 1's Qwen3 (accumulate thinking tokens). Did the same thing for `test_append_eos_after_stop_multi_turn()`. Then since the `_exact()` test does not check thinking token handling with Qwen3, I added `test_skyrl_gym_generator_chat_templating_qwen3_thinking` that is essentially the same test but just with Qwen3 model, and with mock llm generation to be `"<think>\nmock thinking\n</think>\n\nb"` instead of just `"b"`, making the test more realistic.

CharlieFRuan self-requested a review August 21, 2025 18:12

devpatelio marked this pull request as ready for review August 21, 2025 19:02

CharlieFRuan mentioned this pull request Aug 21, 2025

[Tracker] SkyRLGymGenerator improvements #179

Open

14 tasks

devpatelio changed the title ~~Added config for custom chat template (will make easier in subsequent…~~ Custom Chat Template + Thinking Token Masking Aug 21, 2025

tyler-griggs reviewed Aug 22, 2025

View reviewed changes

devpatelio and others added 8 commits August 23, 2025 01:12

Added config for custom chat template (will make easier in subsequent…

3a4b930

… commits as right now, users have to manually add in parser support). Provided thinking masking on/off support for Qwen3 Models

Added test script for masking and demasking, seems to work well!

930cb19

Update run_gsm8k_thinking.sh

700d7c0

updated new

64d6f7e

Done!

Added support for custom masking training with templates for batching…

c27930b

…. Final checks are done and ready!

Fixed bug to append custom response for multi-turn

20e63b7

stash changes

66c23c9

Updated config to support names in custom_chat_template or pass in ji…

7a22332

…nja2 files

devpatelio force-pushed the devpatel/skyrl-issue-104 branch from 15b5eb9 to 7a22332 Compare August 23, 2025 02:49

devpatelio added 3 commits August 23, 2025 03:26

fixed rebase

4233da9

nit: repetition

14d8b92

done

b8cea88

devpatelio requested a review from tyler-griggs August 23, 2025 04:38

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

devpatelio added 3 commits August 24, 2025 21:57

fixed refactor issues with some misnamed variables from the previous …

6a38f6d

…rebase

additional rebase fixes

f7efc37

removed print statements and debug scripts

a077e8d

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

skyrl-train/skyrl_train/config/example_chat_template.j2 Outdated Show resolved Hide resolved

skyrl-train/skyrl_train/generators/utils.py Outdated Show resolved Hide resolved

devpatelio added 2 commits August 24, 2025 22:14

deleted extra jinja template file

7d41b60

fixed bug where prompts were being processed by tokenizer as 1 elemen…

5ab4dde

…t list instead of List[List[int]]

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py Outdated Show resolved Hide resolved

skyrl-train/skyrl_train/generators/utils.py Show resolved Hide resolved

devpatelio added 3 commits September 23, 2025 00:26

resolved ppo merge conflict

162dcb5

Merge branch 'main' into devpatel/skyrl-issue-104

3fad0cb

linter

8044b1d

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

devpatelio mentioned this pull request Sep 26, 2025

Decouple Retokenize and Custom Chat Template #351

Open

CharlieFRuan reviewed Oct 3, 2025

View reviewed changes

devpatelio added 4 commits October 4, 2025 09:03

FINAL CHAT UPDATES

5d873b3

merge conflict

d3c735d

Merge branch 'main' into devpatel/skyrl-issue-104

ff09fd5

inter

ecbf3e9

gemini-code-assist bot reviewed Oct 4, 2025

View reviewed changes

CharlieFRuan reviewed Oct 4, 2025

View reviewed changes

devpatelio added 2 commits October 4, 2025 21:09

last nigts

6f40559

great

bdea263

gemini-code-assist bot reviewed Oct 4, 2025

View reviewed changes

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py Show resolved Hide resolved

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat_templating.py Outdated Show resolved Hide resolved

Update skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat…

e7f6be7

…_templating.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

CharlieFRuan changed the title ~~Custom Chat Template + Thinking Token Masking~~ [SkyRLGymGenerator] Add genreator.chat_template and change Qwen3 default behavior on thinking tokens accumulation Oct 5, 2025

CharlieFRuan merged commit 345a996 into NovaSky-AI:main Oct 5, 2025
3 checks passed

CharlieFRuan changed the title ~~[SkyRLGymGenerator] Add genreator.chat_template and change Qwen3 default behavior on thinking tokens accumulation~~ [SkyRLGymGenerator] Add genreator.chat_template and change Qwen3 default behavior to accumulate thinking tokens Oct 5, 2025

This was referenced Oct 6, 2025

[Generator] Make custom generator examples TI/TO, and use appropriate encoding for Qwen3, ensuring on-policy training #410

Open

[train][qwen3][SkyRLGymGen] Fix qwen3 default codepath with token in token out #438

Merged

	def get_custom_chat_template(chat_template_config: Optional[dict] = None) -> Optional[str]:
	def get_custom_chat_template(chat_template_config: Optional[Union[dict, "DictConfig"]] = None) -> Optional[str]:

Conversation

devpatelio commented Aug 21, 2025 • edited by CharlieFRuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devpatelio commented Aug 23, 2025

Uh oh!

tyler-griggs commented Aug 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tyler-griggs commented Aug 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

devpatelio commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

devpatelio commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devpatelio commented Oct 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

devpatelio commented Aug 21, 2025 •

edited by CharlieFRuan

Loading

devpatelio commented Aug 24, 2025 •

edited

Loading