-
Notifications
You must be signed in to change notification settings - Fork 259
[SkyRLGymGenerator] Add genreator.chat_template and change Qwen3 default behavior to accumulate thinking tokens #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
CharlieFRuan
merged 65 commits into
NovaSky-AI:main
from
devpatelio:devpatel/skyrl-issue-104
Oct 5, 2025
Merged
Changes from all commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
3a4b930
Added config for custom chat template (will make easier in subsequent…
devpatelio 930cb19
Added test script for masking and demasking, seems to work well!
devpatelio 700d7c0
Update run_gsm8k_thinking.sh
devpatelio 64d6f7e
updated new
devpatelio c27930b
Added support for custom masking training with templates for batching…
devpatelio 20e63b7
Fixed bug to append custom response for multi-turn
devpatelio 66c23c9
stash changes
devpatelio 7a22332
Updated config to support names in custom_chat_template or pass in ji…
devpatelio 4233da9
fixed rebase
devpatelio 14d8b92
nit: repetition
devpatelio b8cea88
done
devpatelio 6a38f6d
fixed refactor issues with some misnamed variables from the previous …
devpatelio f7efc37
additional rebase fixes
devpatelio a077e8d
removed print statements and debug scripts
devpatelio 7d41b60
deleted extra jinja template file
devpatelio 5ab4dde
fixed bug where prompts were being processed by tokenizer as 1 elemen…
devpatelio 58082a4
list comprehension
devpatelio fb7b2ea
PR comments applied
devpatelio dcffdaf
Remove assignment of enable_thinking_tokens variable
devpatelio 9663547
removed uv.lock file
tyler-griggs b3398b9
removed redundant model_name
tyler-griggs 0f80225
Update skyrl-train/skyrl_train/generators/skyrl_gym_generator.py
devpatelio f2711a4
Update skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat…
devpatelio b13a647
done
tyler-griggs d2bbe58
fix by using without thinking so no parsing problem, like before
tyler-griggs c1d0679
fixed the test defaults to check without tokenization
tyler-griggs cee373c
allow model_name to pass to SkyRLGymGenerator, do not use in get_cust…
tyler-griggs 99aeb3c
eos token for merge conflict
tyler-griggs d2ee9cd
Merge branch 'main' into devpatel/skyrl-issue-104
devpatelio 85f946f
looks like a new merge conflict removed default logprobs truncation f…
tyler-griggs c0baade
removed redundant appends
tyler-griggs 7dc2790
cleanup
tyler-griggs 169ec5b
removed extra file + cleanup
tyler-griggs 8a001b6
Update skyrl-train/skyrl_train/generators/skyrl_gym_generator.py
devpatelio 1922cb2
fixed issue of overriding log-probs
tyler-griggs f6e1be4
Update skyrl-train/skyrl_train/generators/skyrl_gym_generator.py
devpatelio 81cee5e
added
tyler-griggs 6e75e09
reverted back to working base, no redundant code
devpatelio 29d1446
condensed the if else logic
devpatelio 12dbd9c
Merge branch 'main' into devpatel/skyrl-issue-104
devpatelio 9aed08b
keep the last message thinking in a multi-turn conversation
devpatelio 96d3ade
linter
devpatelio f7e5d28
fixed generator texts to support new format
devpatelio 9e83a00
fixed comment
devpatelio 2a06dbb
checked test, default qwen model has the newline in the unit test so …
devpatelio 61e398e
added unit test for qwen3_without_thinking and default chat template …
devpatelio 4f1eef9
fixed tests to support the config updates
devpatelio 4bf821d
additional fix for get
devpatelio 35459ca
removed unnecessary dictionary return style by not using custom_chat_…
devpatelio 1ecc27e
Update skyrl-train/skyrl_train/generators/utils.py
devpatelio bccb846
addressed new review of comments
devpatelio 64c9031
run_gsm8k update to defaults
devpatelio c9e92f5
added mock tokenizer test for custom chat template with jinja2 file
devpatelio 63b4a23
applied nits and removed test redundancy for jinja2
devpatelio a57d7de
resolved ppo merge conflict
devpatelio 162dcb5
resolved ppo merge conflict
devpatelio 3fad0cb
Merge branch 'main' into devpatel/skyrl-issue-104
devpatelio 8044b1d
linter
devpatelio 5d873b3
FINAL CHAT UPDATES
devpatelio d3c735d
merge conflict
devpatelio ff09fd5
Merge branch 'main' into devpatel/skyrl-issue-104
devpatelio ecbf3e9
inter
devpatelio 6f40559
last nigts
devpatelio bdea263
great
devpatelio e7f6be7
Update skyrl-train/tests/cpu/generators/test_skyrl_gym_generator_chat…
devpatelio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
skyrl-train/tests/cpu/generators/qwen3_acc_without_thinking.jinja2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {% for message in messages %}{% if (message['role'] != 'assistant') %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% elif (message['role'] == 'assistant')%}{{'<|im_start|>' + message['role'] + '\n'}}{% generation %}{% set full_content = message['content'] %}{% set mycontent = message['content'] %}{% set is_last_message = loop.last and messages[-1]['role'] == 'assistant' %}{% if '</think>' in full_content and not is_last_message %}{% set mycontent = full_content.split('</think>')[-1].lstrip('\n') %}{% endif %}{{mycontent + '<|im_end|>'}}{% endgeneration %}{{'\n'}}{% endif %}{% endfor %} | ||
devpatelio marked this conversation as resolved.
Show resolved
Hide resolved
|
||
devpatelio marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.