[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql#190
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration, which is a great step towards a pure token-in-token-out pipeline. The changes are well-scoped, covering environment logic, configurations, example scripts, and documentation. The new validation logic in the environments to ensure stop strings are correctly handled by the generator is a valuable safeguard. I've provided a couple of suggestions to enhance the robustness of this validation against certain edge cases. Additionally, the new checks for sglang backend compatibility are a thoughtful addition that will help prevent runtime configuration errors.
03d2e49 to
85f86a6
Compare
SumanthRH
left a comment
There was a problem hiding this comment.
Thanks! Left a couple comments
| - Chat templating and loss masking for multi-turn conversations are handled by the ``SkyRLGymGenerator`` class. | ||
|
|
||
| - In the above example, we set ``use_conversation_multi_turn=false`` to enforce that the multi-turn conversation is formatted as a single assistant response. | ||
| - We also set ``stop='["</sql>", "</solution>"]'`` for both ``sampling_params`` and ``eval_sampling_params`` as a part |
There was a problem hiding this comment.
It would be best for our Generator to handle this case, and to link that as an example in the doc.
Currently, both the search and the sql scripts use use_conversation_multi_turn=False, but technically after #123 , these should be able to do use_conversation_multi_turn=True with limited effect on performance.
Ideally, I'd like SkyGymGenerator to handle this case in this PR itself. Could you modify SkyGymGenerator to support use_conversation_multi_turn and custom stop tokens?
There was a problem hiding this comment.
Yes makes sense. I'll add a flag to toggle it and add a multi-turn search-R1 script
There was a problem hiding this comment.
Should be addressed with flag append_eos_token_after_stop_str_in_multi_turn. Updated PR description and a new unit test.
Could you take another look when you get a chance? Thanks!
There was a problem hiding this comment.
Nvm, just realized sampling_params is None in the generator level. Will further fix
…or search and txt2sql
1a7e681 to
f8971ed
Compare
|
Example output of To determine the inauguration year of the municipality that hosts the Amasaman Secondary Technical School, I'll begin with some reasoning:
1. The Amasaman Secondary Technical School is likely in a particular municipality in Ghana.
2. I need to find the founding or administrative establishment year of the municipality associated with this school.
Since I don't have direct information about the specific school's founding year, I will start with the municipality itself. I might not know a certified or widely recognized founding year of the municipality, so I will need to check resources or databases.
After performing research:
- I find that Amasaman Secondary Technical School operates in the municipality and district of Asante-Tuobiko, Brong Ahafo Region in Ghana.
- I will now search for the founding year or administrative establishment of Asante-Tuobiko Municipality as a whole.
Let's search now.
<search> Municipal founding dates for Asante-Tuobiko, Brong Ahafo Region, Ghana </search>
<|im_end|>
<|im_start|>user
<information>{"result": "..."}</information>
<|im_end|>
<|im_start|>assistant
Based on the information provided:
1. Tuobodom is the capital of the Techiman North District, which is within Asante-Tuobiko Municipality.
2. Tuobodom was created by the then President of Ghana Professor John Evans Atta Mills (in reference to the creation date mentioned for "Techiman North District").
However, the text doesn't provide an exact founding date for Asante-Tuobiko Municipality. Instead, it mentions the creation of Techiman North District, which implies the establishment of Tuobodom and its district.
Given the lack of direct information about what year Asante-Tuobiko Municipality was established, I can directly provide an answer based on the municipal creation dates exposed:
<answer>1958</answer><|im_end|>The |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration. This is a solid improvement that avoids re-tokenization and simplifies the logic. The changes are consistently applied across environments, configurations, examples, and documentation. The addition of validation checks for SGLang and warnings for multi-turn configurations is also a great touch. I've included a few suggestions to improve code clarity and reduce repetition.
After #190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail
After NovaSky-AI/SkyRL#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail
After NovaSky-AI/SkyRL#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail
…rocessing action in search and txt2sql environments (#190) Motivated by token-in-token-out in NovaSky-AI#152, we want to avoid post-processing the action in `Env.step()`, which happens in the string space and causes re-tokenization. Currently, such post processing happens in our search and txt2sql environemnt. To fix it for these two environments, we add `cfg.generator.sampling_params.stop` and `cfg.generator.eval_sampling_params.stop`. This will be equivalent to the post-processing done in the search and txt2sql environment, which simply strips after tags like `"</search>"`. Furthermore, for these two environments, if `generator.use_conversation_multi_turn=true`, we need to append an EOS token ID after these stop strings to adhere to the chat template, e.g. ending in `</answer><|im_end|>`. This is only required in codepath 1 (out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files. Codepath 3 (always retokenize) is not needed because the chat template always applies eos to the end of message content. We do such append when `cfg.generator.append_eos_token_after_stop_str_in_multi_turn` is `True` and add `run_search_multiturn.sh` and `run_skyrl_sql_multiturn.sh` correspondingly. In addition, this PR: - Add `"no_stop_trim": True,` to SGLang sampling params, which is equivalent to vLLM's `include_stop_str_in_output`. - Throw error when `min_new_tokens` or `stop` is used with SGLang backend, since SGLang will throw an error when these sampling parameters are used when the engine is initialized with `skip_tokenizer_init=True`. - See this issue for more: sgl-project/sglang#9039 (comment) Tested by: - `tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e` - E2E run of search r1 in NovaSky-AI#152 is run with these changes - `run_gsm8k.sh` with sglang and vllm -- also made sure those unsupported fields in sglang will raise error as expected Note: It is worth noting that, if say the stop string is `</search>`, and vLLM generated a token equivalent to `>\n`, the string output of vLLM will truncate to only have `</search>` while the token ID output will still be `>\n`. This can be observed in this script: https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5
After NovaSky-AI#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail
Motivated by token-in-token-out in #152, we want to avoid post-processing the action in
Env.step(), which happens in the string space and causes re-tokenization.Currently, such post processing happens in our search and txt2sql environemnt. To fix it for these two environments, we add
cfg.generator.sampling_params.stopandcfg.generator.eval_sampling_params.stop. This will be equivalent to the post-processing done in the search and txt2sql environment, which simply strips after tags like"</search>".Furthermore, for these two environments, if
generator.use_conversation_multi_turn=true, we need to append an EOS token ID after these stop strings to adhere to the chat template, e.g. ending in</answer><|im_end|>. This is only required in codepath 1 (out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files. Codepath 3 (always retokenize) is not needed because the chat template always applies eos to the end of message content. We do such append whencfg.generator.append_eos_token_after_stop_str_in_multi_turnisTrueand addrun_search_multiturn.shandrun_skyrl_sql_multiturn.shcorrespondingly.In addition, this PR:
"no_stop_trim": True,to SGLang sampling params, which is equivalent to vLLM'sinclude_stop_str_in_output.min_new_tokensorstopis used with SGLang backend, since SGLang will throw an error when these sampling parameters are used when the engine is initialized withskip_tokenizer_init=True.min_new_tokens>0 , generate have a issue sgl-project/sglang#9039 (comment)Tested by:
tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2erun_gsm8k.shwith sglang and vllm -- also made sure those unsupported fields in sglang will raise error as expectedNote:
It is worth noting that, if say the stop string is
</search>, and vLLM generated a token equivalent to>\n, the string output of vLLM will truncate to only have</search>while the token ID output will still be>\n. This can be observed in this script: https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5