[Fix][CI] Address generator CI test fails when model stop reason is length by CharlieFRuan · Pull Request #269 · NovaSky-AI/SkyRL

CharlieFRuan · 2025-09-09T03:12:27Z

Our unit test checks whether, for turns=3, the final conversation generates 3 EOS token. This will not be the case when the model's generation stop due to length. To address, we do the following:

In the dummy environment, change observation "turn {i}" to "give me another solution {i}", which might make more sense for the model
Increase max generate length from 1000 to 3000
Final guard is to check when the stop reason is not "stop", we don't check the number of EOS token IDs

…p reason is length

gemini-code-assist

Code Review

This pull request aims to fix a flaky CI test by using larger models, more descriptive prompts, and adding a check for the model's stop reason before asserting on the output. The changes are generally well-aligned with the goal. I've identified a minor opportunity for code refactoring to improve maintainability and a potential bug in another test that could lead to similar flakiness. My review includes suggestions to address these points.

skyrl-train/tests/gpu/gpu_ci/test_skyrl_gym_generator.py

SumanthRH · 2025-09-09T03:23:06Z

skyrl-train/tests/gpu/gpu_ci/test_skyrl_gym_generator.py

 MODEL_TO_GENERATION_PROMPT = {
-    "Qwen/Qwen2.5-1.5B-Instruct": "<|im_start|>assistant\n",
-    "unsloth/Llama-3.2-1B-Instruct": "<|start_header_id|>assistant<|end_header_id|>\n\n",
-    "Qwen/Qwen3-0.6B": "<|im_start|>assistant\n",
+    "Qwen/Qwen2.5-3B-Instruct": "<|im_start|>assistant\n",
+    "unsloth/Llama-3.2-3B-Instruct": "<|start_header_id|>assistant<|end_header_id|>\n\n",
+    "Qwen/Qwen3-1.7B": "<|im_start|>assistant\n",


Hmm this doesn't seem like the best fix. Idealy we keep models as small as possible

This is motivated by small models repeating themselves and hitting hte length limit?

Intuitively the agent loop exits based on three criteria:

step limit: max_turns in our case

budget: cost for openai models, in our case it's just max length

model-initiated: model wants to exit.

It looks like we should just add 2. and this is fixed?

Yea I included solution 2, but in that case not much can be checked and hence makes the test a bit lenient.

But since this doesn't seem to happen frequently we can do it with a warning.

I will revert this model size change, but keep the max generate length increase

Got it.

I also feel this test itself is not great given that we are just checking for total eos tokens and thus need to deal with these edge cases. We should revisit the test later.

…ength (NovaSky-AI#269) Our unit test checks whether, for turns=3, the final conversation generates 3 EOS token. This will not be the case when the model's generation stop due to length. To address, we do the following: - In the dummy environment, change observation `"turn {i}"` to `"give me another solution {i}"`, which might make more sense for the model - Increase max generate length from 1000 to 3000 - Final guard is to check when the stop reason is not `"stop"`, we don't check the number of EOS token IDs

CharlieFRuan added 2 commits September 9, 2025 03:08

[Fix][CI] Address how skyrlgym generator CI test fails when model sto…

dff3f85

…p reason is length

fix linter

e73147d

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

skyrl-train/tests/gpu/gpu_ci/test_skyrl_gym_generator.py Outdated Show resolved Hide resolved

address gemini comments

d9939cf

SumanthRH reviewed Sep 9, 2025

View reviewed changes

revert model size change

ee1b322

CharlieFRuan mentioned this pull request Aug 31, 2025

[Tracker] SkyRLGymGenerator improvements #179

Open

14 tasks

cleanups

a1e8b92

SumanthRH approved these changes Sep 9, 2025

View reviewed changes

fix

230354c

CharlieFRuan merged commit 45b6a98 into main Sep 9, 2025
3 checks passed

CharlieFRuan deleted the 0908-ci-generator-fix branch September 9, 2025 03:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][CI] Address generator CI test fails when model stop reason is length#269

[Fix][CI] Address generator CI test fails when model stop reason is length#269
CharlieFRuan merged 6 commits intomainfrom
0908-ci-generator-fix

CharlieFRuan commented Sep 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

SumanthRH Sep 9, 2025

Uh oh!

CharlieFRuan Sep 9, 2025

Uh oh!

CharlieFRuan Sep 9, 2025

Uh oh!

SumanthRH Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

CharlieFRuan commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SumanthRH Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

CharlieFRuan commented Sep 9, 2025 •

edited

Loading