[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql by CharlieFRuan · Pull Request #190 · NovaSky-AI/SkyRL

CharlieFRuan · 2025-08-24T20:10:31Z

Motivated by token-in-token-out in #152, we want to avoid post-processing the action in Env.step(), which happens in the string space and causes re-tokenization.

Currently, such post processing happens in our search and txt2sql environemnt. To fix it for these two environments, we add cfg.generator.sampling_params.stop and cfg.generator.eval_sampling_params.stop. This will be equivalent to the post-processing done in the search and txt2sql environment, which simply strips after tags like "</search>".

Furthermore, for these two environments, if generator.use_conversation_multi_turn=true, we need to append an EOS token ID after these stop strings to adhere to the chat template, e.g. ending in </answer><|im_end|>. This is only required in codepath 1 (out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files. Codepath 3 (always retokenize) is not needed because the chat template always applies eos to the end of message content. We do such append when cfg.generator.append_eos_token_after_stop_str_in_multi_turn is True and add run_search_multiturn.sh and run_skyrl_sql_multiturn.sh correspondingly.

In addition, this PR:

Add "no_stop_trim": True, to SGLang sampling params, which is equivalent to vLLM's include_stop_str_in_output.
Throw error when min_new_tokens or stop is used with SGLang backend, since SGLang will throw an error when these sampling parameters are used when the engine is initialized with skip_tokenizer_init=True.
- See this issue for more: [Bug] when sampling_params min_new_tokens>0 , generate have a issue sgl-project/sglang#9039 (comment)

Tested by:

tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e
E2E run of search r1 in [Generator] Support token-in-token-out rollout #152 is run with these changes
run_gsm8k.sh with sglang and vllm -- also made sure those unsupported fields in sglang will raise error as expected

Note:
It is worth noting that, if say the stop string is </search>, and vLLM generated a token equivalent to >\n, the string output of vLLM will truncate to only have </search> while the token ID output will still be >\n. This can be observed in this script: https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5

gemini-code-assist

Code Review

This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration, which is a great step towards a pure token-in-token-out pipeline. The changes are well-scoped, covering environment logic, configurations, example scripts, and documentation. The new validation logic in the environments to ensure stop strings are correctly handled by the generator is a valuable safeguard. I've provided a couple of suggestions to enhance the robustness of this validation against certain edge cases. Additionally, the new checks for sglang backend compatibility are a thoughtful addition that will help prevent runtime configuration errors.

skyrl-gym/skyrl_gym/envs/search/env.py

skyrl-gym/skyrl_gym/envs/sql/env.py

SumanthRH

Thanks! Left a couple comments

skyrl-train/skyrl_train/utils/utils.py

SumanthRH · 2025-08-25T20:50:35Z

skyrl-train/docs/examples/multi_turn_text2sql.rst

 - Chat templating and loss masking for multi-turn conversations are handled by the ``SkyRLGymGenerator`` class.

  - In the above example, we set ``use_conversation_multi_turn=false`` to enforce that the multi-turn conversation is formatted as a single assistant response.
+  - We also set ``stop='["</sql>", "</solution>"]'`` for both ``sampling_params`` and ``eval_sampling_params`` as a part


It would be best for our Generator to handle this case, and to link that as an example in the doc.

Currently, both the search and the sql scripts use use_conversation_multi_turn=False, but technically after #123 , these should be able to do use_conversation_multi_turn=True with limited effect on performance.

Ideally, I'd like SkyGymGenerator to handle this case in this PR itself. Could you modify SkyGymGenerator to support use_conversation_multi_turn and custom stop tokens?

Yes makes sense. I'll add a flag to toggle it and add a multi-turn search-R1 script

Should be addressed with flag append_eos_token_after_stop_str_in_multi_turn. Updated PR description and a new unit test.

Could you take another look when you get a chance? Thanks!

Nvm, just realized sampling_params is None in the generator level. Will further fix

@SumanthRH ready again!

…or search and txt2sql

CharlieFRuan · 2025-08-27T00:47:49Z

Example output of run_search_multiturn.sh:

To determine the inauguration year of the municipality that hosts the Amasaman Secondary Technical School, I'll begin with some reasoning:

1. The Amasaman Secondary Technical School is likely in a particular municipality in Ghana.
2. I need to find the founding or administrative establishment year of the municipality associated with this school.

Since I don't have direct information about the specific school's founding year, I will start with the municipality itself. I might not know a certified or widely recognized founding year of the municipality, so I will need to check resources or databases.

After performing research:
- I find that Amasaman Secondary Technical School operates in the municipality and district of Asante-Tuobiko, Brong Ahafo Region in Ghana.
- I will now search for the founding year or administrative establishment of Asante-Tuobiko Municipality as a whole.

Let's search now.
<search> Municipal founding dates for Asante-Tuobiko, Brong Ahafo Region, Ghana </search>
<|im_end|>
<|im_start|>user

<information>{"result": "..."}</information>
<|im_end|>
<|im_start|>assistant
Based on the information provided:

1. Tuobodom is the capital of the Techiman North District, which is within Asante-Tuobiko Municipality.
2. Tuobodom was created by the then President of Ghana Professor John Evans Atta Mills (in reference to the creation date mentioned for "Techiman North District").

However, the text doesn't provide an exact founding date for Asante-Tuobiko Municipality. Instead, it mentions the creation of Techiman North District, which implies the establishment of Tuobodom and its district. 

Given the lack of direct information about what year Asante-Tuobiko Municipality was established, I can directly provide an answer based on the municipal creation dates exposed:

<answer>1958</answer><|im_end|>

The <|im_end|> are added in the right place. Note there is a \n between </search> and <|im_end|> which is expected. See last section of the PR description.

CharlieFRuan · 2025-08-27T06:15:57Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration. This is a solid improvement that avoids re-tokenization and simplifies the logic. The changes are consistently applied across environments, configurations, examples, and documentation. The addition of validation checks for SGLang and warnings for multi-turn configurations is also a great touch. I've included a few suggestions to improve code clarity and reduce repetition.

skyrl-gym/skyrl_gym/envs/search/env.py

skyrl-gym/skyrl_gym/envs/sql/env.py

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/examples/search/run_search_multiturn.sh

skyrl-train/examples/text_to_sql/run_sql_deepspeed.sh

skyrl-train/examples/text_to_sql/run_skyrl_sql_conversation_format.sh

SumanthRH

Thanks

# What does this PR do? Missed a change before merging #190 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

After #190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail

After NovaSky-AI/SkyRL#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail

…rocessing action in search and txt2sql environments (#190) Motivated by token-in-token-out in NovaSky-AI#152, we want to avoid post-processing the action in `Env.step()`, which happens in the string space and causes re-tokenization. Currently, such post processing happens in our search and txt2sql environemnt. To fix it for these two environments, we add `cfg.generator.sampling_params.stop` and `cfg.generator.eval_sampling_params.stop`. This will be equivalent to the post-processing done in the search and txt2sql environment, which simply strips after tags like `"</search>"`. Furthermore, for these two environments, if `generator.use_conversation_multi_turn=true`, we need to append an EOS token ID after these stop strings to adhere to the chat template, e.g. ending in `</answer><|im_end|>`. This is only required in codepath 1 (out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files. Codepath 3 (always retokenize) is not needed because the chat template always applies eos to the end of message content. We do such append when `cfg.generator.append_eos_token_after_stop_str_in_multi_turn` is `True` and add `run_search_multiturn.sh` and `run_skyrl_sql_multiturn.sh` correspondingly. In addition, this PR: - Add `"no_stop_trim": True,` to SGLang sampling params, which is equivalent to vLLM's `include_stop_str_in_output`. - Throw error when `min_new_tokens` or `stop` is used with SGLang backend, since SGLang will throw an error when these sampling parameters are used when the engine is initialized with `skip_tokenizer_init=True`. - See this issue for more: sgl-project/sglang#9039 (comment) Tested by: - `tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e` - E2E run of search r1 in NovaSky-AI#152 is run with these changes - `run_gsm8k.sh` with sglang and vllm -- also made sure those unsupported fields in sglang will raise error as expected Note: It is worth noting that, if say the stop string is `</search>`, and vLLM generated a token equivalent to `>\n`, the string output of vLLM will truncate to only have `</search>` while the token ID output will still be `>\n`. This can be observed in this script: https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5

# What does this PR do? Missed a change before merging #190 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

After NovaSky-AI#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

skyrl-gym/skyrl_gym/envs/search/env.py Outdated Show resolved Hide resolved

skyrl-gym/skyrl_gym/envs/sql/env.py Outdated Show resolved Hide resolved

CharlieFRuan mentioned this pull request Aug 23, 2025

[Tracker] SkyRLGymGenerator improvements #179

Open

14 tasks

CharlieFRuan requested review from SumanthRH and erictang000 August 24, 2025 20:29

CharlieFRuan force-pushed the pr-0824-stop-str branch from 03d2e49 to 85f86a6 Compare August 25, 2025 06:07

SumanthRH self-assigned this Aug 25, 2025

SumanthRH reviewed Aug 25, 2025

View reviewed changes

CharlieFRuan added 8 commits August 26, 2025 20:18

[Generator][Env] Add stop str, remove need for postprocessed action f…

259f0b0

…or search and txt2sql

fix lint, add warning

33fb740

address gemini comments

62d38fd

remove postprocess action tests from skyrlgym

cf9a63a

append eos to response after stop str in multiturn

cdf24a5

fix scripts and doc according to append_eos

b3fc6ff

fix unit tests

cb67035

change logs

f8971ed

CharlieFRuan force-pushed the pr-0824-stop-str branch from 1a7e681 to f8971ed Compare August 26, 2025 20:19

CharlieFRuan marked this pull request as draft August 26, 2025 22:44

Address how sampling_params is None for rollout

c05f2e9

CharlieFRuan marked this pull request as ready for review August 27, 2025 00:49

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

skyrl-gym/skyrl_gym/envs/search/env.py Outdated Show resolved Hide resolved

skyrl-gym/skyrl_gym/envs/sql/env.py Outdated Show resolved Hide resolved

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py Outdated Show resolved Hide resolved

SumanthRH reviewed Aug 27, 2025

View reviewed changes

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py Outdated Show resolved Hide resolved

SumanthRH reviewed Aug 27, 2025

View reviewed changes

skyrl-train/examples/search/run_search_multiturn.sh Show resolved Hide resolved

SumanthRH reviewed Aug 27, 2025

View reviewed changes

skyrl-train/examples/text_to_sql/run_sql_deepspeed.sh Show resolved Hide resolved

Address comments

0d2effd

SumanthRH reviewed Aug 28, 2025

View reviewed changes

skyrl-train/examples/text_to_sql/run_skyrl_sql_conversation_format.sh Show resolved Hide resolved

SumanthRH approved these changes Aug 28, 2025

View reviewed changes

SumanthRH merged commit 9fe6b13 into NovaSky-AI:main Aug 28, 2025
3 checks passed

SumanthRH mentioned this pull request Aug 28, 2025

[Examples] Minor fix after #190 #216

Merged

SumanthRH added a commit that referenced this pull request Aug 28, 2025

[Examples] Minor fix after #190 (#216)

1fd5f33

# What does this PR do? Missed a change before merging #190 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

CharlieFRuan mentioned this pull request Aug 31, 2025

[Fix][CI] Fix GPU CI where search needs stop param #232

Merged

tyler-griggs pushed a commit that referenced this pull request Aug 31, 2025

[Fix][CI] Fix GPU CI where search needs stop param (#232)

de471a9

After #190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail

dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026

[Examples] Minor fix after #190 (#216)

530ea8c

# What does this PR do? Missed a change before merging #190 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026

[Fix][CI] Fix GPU CI where search needs stop param (#232)

220b747

After NovaSky-AI#190, we need to pass `stop` as sampling params for multiturn search / text2sql. Without it, the GPU CI will likely fail

Conversation

CharlieFRuan commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SumanthRH Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan commented Aug 27, 2025

Uh oh!

CharlieFRuan commented Aug 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CharlieFRuan commented Aug 24, 2025 •

edited

Loading