Skip to content

[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql#190

Merged
SumanthRH merged 10 commits intoNovaSky-AI:mainfrom
CharlieFRuan:pr-0824-stop-str
Aug 28, 2025
Merged

[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql#190
SumanthRH merged 10 commits intoNovaSky-AI:mainfrom
CharlieFRuan:pr-0824-stop-str

Conversation

@CharlieFRuan
Copy link
Collaborator

@CharlieFRuan CharlieFRuan commented Aug 24, 2025

Motivated by token-in-token-out in #152, we want to avoid post-processing the action in Env.step(), which happens in the string space and causes re-tokenization.

Currently, such post processing happens in our search and txt2sql environemnt. To fix it for these two environments, we add cfg.generator.sampling_params.stop and cfg.generator.eval_sampling_params.stop. This will be equivalent to the post-processing done in the search and txt2sql environment, which simply strips after tags like "</search>".

Furthermore, for these two environments, if generator.use_conversation_multi_turn=true, we need to append an EOS token ID after these stop strings to adhere to the chat template, e.g. ending in </answer><|im_end|>. This is only required in codepath 1 (out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files. Codepath 3 (always retokenize) is not needed because the chat template always applies eos to the end of message content. We do such append when cfg.generator.append_eos_token_after_stop_str_in_multi_turn is True and add run_search_multiturn.sh and run_skyrl_sql_multiturn.sh correspondingly.

In addition, this PR:

Tested by:

  • tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e
  • E2E run of search r1 in [Generator] Support token-in-token-out rollout #152 is run with these changes
  • run_gsm8k.sh with sglang and vllm -- also made sure those unsupported fields in sglang will raise error as expected

Note:
It is worth noting that, if say the stop string is </search>, and vLLM generated a token equivalent to >\n, the string output of vLLM will truncate to only have </search> while the token ID output will still be >\n. This can be observed in this script: https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration, which is a great step towards a pure token-in-token-out pipeline. The changes are well-scoped, covering environment logic, configurations, example scripts, and documentation. The new validation logic in the environments to ensure stop strings are correctly handled by the generator is a valuable safeguard. I've provided a couple of suggestions to enhance the robustness of this validation against certain edge cases. Additionally, the new checks for sglang backend compatibility are a thoughtful addition that will help prevent runtime configuration errors.

Copy link
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a couple comments

- Chat templating and loss masking for multi-turn conversations are handled by the ``SkyRLGymGenerator`` class.

- In the above example, we set ``use_conversation_multi_turn=false`` to enforce that the multi-turn conversation is formatted as a single assistant response.
- We also set ``stop='["</sql>", "</solution>"]'`` for both ``sampling_params`` and ``eval_sampling_params`` as a part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best for our Generator to handle this case, and to link that as an example in the doc.

Currently, both the search and the sql scripts use use_conversation_multi_turn=False, but technically after #123 , these should be able to do use_conversation_multi_turn=True with limited effect on performance.

Ideally, I'd like SkyGymGenerator to handle this case in this PR itself. Could you modify SkyGymGenerator to support use_conversation_multi_turn and custom stop tokens?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense. I'll add a flag to toggle it and add a multi-turn search-R1 script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed with flag append_eos_token_after_stop_str_in_multi_turn. Updated PR description and a new unit test.

Could you take another look when you get a chance? Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, just realized sampling_params is None in the generator level. Will further fix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SumanthRH ready again!

@CharlieFRuan CharlieFRuan marked this pull request as draft August 26, 2025 22:44
@CharlieFRuan
Copy link
Collaborator Author

Example output of run_search_multiturn.sh:

To determine the inauguration year of the municipality that hosts the Amasaman Secondary Technical School, I'll begin with some reasoning:

1. The Amasaman Secondary Technical School is likely in a particular municipality in Ghana.
2. I need to find the founding or administrative establishment year of the municipality associated with this school.

Since I don't have direct information about the specific school's founding year, I will start with the municipality itself. I might not know a certified or widely recognized founding year of the municipality, so I will need to check resources or databases.

After performing research:
- I find that Amasaman Secondary Technical School operates in the municipality and district of Asante-Tuobiko, Brong Ahafo Region in Ghana.
- I will now search for the founding year or administrative establishment of Asante-Tuobiko Municipality as a whole.

Let's search now.
<search> Municipal founding dates for Asante-Tuobiko, Brong Ahafo Region, Ghana </search>
<|im_end|>
<|im_start|>user

<information>{"result": "..."}</information>
<|im_end|>
<|im_start|>assistant
Based on the information provided:

1. Tuobodom is the capital of the Techiman North District, which is within Asante-Tuobiko Municipality.
2. Tuobodom was created by the then President of Ghana Professor John Evans Atta Mills (in reference to the creation date mentioned for "Techiman North District").

However, the text doesn't provide an exact founding date for Asante-Tuobiko Municipality. Instead, it mentions the creation of Techiman North District, which implies the establishment of Tuobodom and its district. 

Given the lack of direct information about what year Asante-Tuobiko Municipality was established, I can directly provide an answer based on the municipal creation dates exposed:

<answer>1958</answer><|im_end|>

The <|im_end|> are added in the right place. Note there is a \n between </search> and <|im_end|> which is expected. See last section of the PR description.

@CharlieFRuan CharlieFRuan marked this pull request as ready for review August 27, 2025 00:49
@CharlieFRuan
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the need for post-processing actions in the search and txt2sql environments by introducing stop strings in the generator configuration. This is a solid improvement that avoids re-tokenization and simplifies the logic. The changes are consistently applied across environments, configurations, examples, and documentation. The addition of validation checks for SGLang and warnings for multi-turn configurations is also a great touch. I've included a few suggestions to improve code clarity and reduce repetition.

Copy link
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@SumanthRH SumanthRH merged commit 9fe6b13 into NovaSky-AI:main Aug 28, 2025
3 checks passed
SumanthRH added a commit that referenced this pull request Aug 28, 2025
# What does this PR do?

Missed a change before merging #190

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
tyler-griggs pushed a commit that referenced this pull request Aug 31, 2025
After #190, we need to pass
`stop` as sampling params for multiturn search / text2sql. Without it,
the GPU CI will likely fail
ztcanddota added a commit to ztcanddota/skyagent that referenced this pull request Sep 28, 2025
After NovaSky-AI/SkyRL#190, we need to pass
`stop` as sampling params for multiturn search / text2sql. Without it,
the GPU CI will likely fail
SungjunlaLee added a commit to SungjunlaLee/SkyRL that referenced this pull request Jan 3, 2026
After NovaSky-AI/SkyRL#190, we need to pass
`stop` as sampling params for multiturn search / text2sql. Without it,
the GPU CI will likely fail
dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026
…rocessing action in search and txt2sql environments (#190)

Motivated by token-in-token-out in
NovaSky-AI#152, we want to avoid
post-processing the action in `Env.step()`, which happens in the string
space and causes re-tokenization.

Currently, such post processing happens in our search and txt2sql
environemnt. To fix it for these two environments, we add
`cfg.generator.sampling_params.stop` and
`cfg.generator.eval_sampling_params.stop`. This will be equivalent to
the post-processing done in the search and txt2sql environment, which
simply strips after tags like `"</search>"`.

Furthermore, for these two environments, if
`generator.use_conversation_multi_turn=true`, we need to append an EOS
token ID after these stop strings to adhere to the chat template, e.g.
ending in `</answer><|im_end|>`. This is only required in codepath 1
(out of the 3): https://github.com/NovaSky-AI/SkyRL/pull/186/files.
Codepath 3 (always retokenize) is not needed because the chat template
always applies eos to the end of message content. We do such append when
`cfg.generator.append_eos_token_after_stop_str_in_multi_turn` is `True`
and add `run_search_multiturn.sh` and `run_skyrl_sql_multiturn.sh`
correspondingly.

In addition, this PR:
- Add `"no_stop_trim": True,` to SGLang sampling params, which is
equivalent to vLLM's `include_stop_str_in_output`.
- Throw error when `min_new_tokens` or `stop` is used with SGLang
backend, since SGLang will throw an error when these sampling parameters
are used when the engine is initialized with `skip_tokenizer_init=True`.
- See this issue for more:
sgl-project/sglang#9039 (comment)

Tested by:
-
`tests/gpu/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e`
- E2E run of search r1 in NovaSky-AI#152
is run with these changes
- `run_gsm8k.sh` with sglang and vllm -- also made sure those
unsupported fields in sglang will raise error as expected

Note:
It is worth noting that, if say the stop string is `</search>`, and vLLM
generated a token equivalent to `>\n`, the string output of vLLM will
truncate to only have `</search>` while the token ID output will still
be `>\n`. This can be observed in this script:
https://gist.github.com/CharlieFRuan/ca3a8fee388263f7e2f96bc89e2ee7f5
dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026
# What does this PR do?

Missed a change before merging #190

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
After NovaSky-AI#190, we need to pass
`stop` as sampling params for multiturn search / text2sql. Without it,
the GPU CI will likely fail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants