Decouple Retokenize and Custom Chat Template#351
Decouple Retokenize and Custom Chat Template#351devpatelio wants to merge 65 commits intoNovaSky-AI:mainfrom
Conversation
… commits as right now, users have to manually add in parser support). Provided thinking masking on/off support for Qwen3 Models
Done!
…. Final checks are done and ready!
…t list instead of List[List[int]]
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…_templating.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…om_chat_template endpoint to ensure tests are aligned
…or no chat template selection, brought it back
…i brought this back. return None if no name_or_path since behaviour for apply_chat_template uses default template from tokenizer in that case. Addressed all other PRs too
…when name_or_path is None
…template for list return, same template as without_thinking but return type is different and supports test
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request effectively decouples the retokenize and custom_chat_template configurations by introducing a more flexible chat_template config structure and a separate retokenize boolean flag. The changes in skyrl_train/generators/utils.py to handle template loading are a significant improvement. However, I've identified a critical issue where an old assertion was missed during refactoring, which will lead to a runtime error and undermines the decoupling effort. Additionally, a unit test intended to cover the retokenization path is now broken due to the logic changes. My review includes specific suggestions to address these points.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively decouples the retokenize and custom_chat_template configurations, which is a great improvement for flexibility. The implementation is solid, with new configuration options in ppo_base_config.yaml and updated logic in skyrl_gym_generator.py and utils.py. The tests have also been updated to cover the new functionality, including loading templates from files.
I have a few suggestions to improve code clarity and robustness, mainly around configuration handling and variable naming. I've also noted a minor point about a new test file and an outdated TODO comment.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Addresses #179: Separate configuration of cfg.generator.retokenize and cfg.generator.custom_chat_template
Note, this branch is an extension of this PR for improving the custom chat template endpoint flexibility (#178)
Separate configuration of cfg.generator.retokenize and cfg.generator.custom_chat_template
Currently, prior to this PR, having a custom_chat_template means that we always retokenize chat history, rather than performing token-in-token-out style of appending tokens. But these behaviors should be decoupled. Refer to https://skyrl.readthedocs.io/en/latest/tutorials/skyrl_gym_generator.html for more details.