[SkyRL-Gym] Make SQL and Search envs return None for intermediate steps by erictang000 · Pull Request #299 · NovaSky-AI/SkyRL

erictang000 · 2025-09-16T01:30:37Z

Overview

After #226, the SkyRLGymGenerator expects turn level rewards to be None if the env uses trajectory level rewards. After #271, this causes issues computing metrics for SQL and Search envs. Setting intermediate reward to None and making it an optional type fixes this.

Additionally, handles case in SkyRLGymGenerator where we exit the agent loop due to hitting max length, which results in rewards being all None - this is fixed by setting the last reward to 0.

CharlieFRuan · 2025-09-16T02:59:30Z

/gemini review

gemini-code-assist

Code Review

This pull request correctly modifies the SQL and Search environments to return None for intermediate step rewards, aligning with the SkyRLGymGenerator's expectations. The change to handle episode truncation in SkyRLGymGenerator by setting a None reward to 0.0 is also a good addition. My review includes a few suggestions to correct the return type hints in the modified environment methods, which were missed, and a minor documentation consistency fix.

gemini-code-assist · 2025-09-16T03:01:06Z

skyrl-gym/skyrl_gym/envs/search/env.py

        else:
            # No reward for intermediate steps for Search tasks
-            return 0
+            return None


With this change, the function can now return None. Please update the function's return type hint on line 45 from float to Optional[float] to reflect this. Optional is already imported in this file.

gemini-code-assist · 2025-09-16T03:01:06Z

skyrl-gym/skyrl_gym/envs/sql/env.py

        else:
            # No reward for intermediate steps for SQL tasks
-            return 0
+            return None


This function now returns None for intermediate steps. The return type hint on line 78 should be updated from float to Optional[float] to match this change. You'll need to add Optional to your imports from the typing module (e.g., from typing import Optional).

gemini-code-assist · 2025-09-16T03:01:06Z

skyrl-train/docs/tutorials/new_env.rst

            BaseTextEnvStepOutput containing:
            - observations: New messages from the environment
-            - reward: Float reward for the action  
+            - reward: Optional[Float] reward for the action, None if intermediate steps have no reward


For consistency with Python's type hinting syntax, it would be clearer to use Optional[float] instead of Optional[Float].

Suggested change

- reward: Optional[Float] reward for the action, None if intermediate steps have no reward

- reward: Optional[float] reward for the action, None if intermediate steps have no reward

CharlieFRuan · 2025-09-16T03:42:40Z

As discussed offline, will revert #271 first to avoid the error that Eric is observing from happening.

The error is due to this line:

SkyRL/skyrl-train/skyrl_train/generators/utils.py

Line 65 in 2e7aba9

mean_raw_reward = float(np.mean([sum(seq_rewards) for seq_rewards in rewards]))

Where it was that sum(seq_rewards) doesn't work bc seq_rewards was a float, but rewards[0] is a list, which means some were lists and some were floats. This is because intermediate reward for sql and search are 0, so some trajectory's reward is [0, 1], and some trajectory's reward can be [1]. The first goes into else, the second goes into if:

SkyRL/skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

Lines 292 to 298 in 2e7aba9

    
           if all(reward is None for reward, _ in per_step_rewards[:-1]): 
        
               # If all rewards besides the last one are None (i.e. per-trajectory reward), we keep it as a float 
        
               reward_out = per_step_rewards[-1][0] 
        
           else: 
        
               # Otherwise build token-level rewards placed at assistant turn boundaries 
        
               token_level_rewards: List[float] = [0.0] * len(response_ids) 
        
               for step_reward, idx in per_step_rewards:

Reverting #271 will prevent the error at the cost of not seeing pass_at_n metric. Which we will follow up with a fix.

…low due to token-level rewards" (#300) Reverts #271 as it causes errors For more, see #299 (comment)

erictang000 · 2025-09-17T20:45:16Z

closing and tracking actual fix in #311

…low due to token-level rewards" (#300) Reverts NovaSky-AI/SkyRL#271 as it causes errors For more, see NovaSky-AI/SkyRL#299 (comment)

…low due to token-level rewards" (NovaSky-AI#300) Reverts NovaSky-AI#271 as it causes errors For more, see NovaSky-AI#299 (comment)

X

a1433a3

erictang000 marked this pull request as ready for review September 16, 2025 01:32

erictang000 requested a review from CharlieFRuan September 16, 2025 01:33

x

dff8417

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

CharlieFRuan mentioned this pull request Sep 16, 2025

Revert "[Fix] Fix pass_at_k missing for SkRLGymGenerator.agent_loop flow due to token-level rewards" #300

Merged

erictang000 pushed a commit that referenced this pull request Sep 16, 2025

Revert "[Fix] Fix pass_at_k missing for SkRLGymGenerator.agent_loop f…

7406b96

…low due to token-level rewards" (#300) Reverts #271 as it causes errors For more, see #299 (comment)

erictang000 closed this Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[SkyRL-Gym] Make SQL and Search envs return None for intermediate steps#299

[SkyRL-Gym] Make SQL and Search envs return None for intermediate steps#299
erictang000 wants to merge 2 commits intoNovaSky-AI:mainfrom
erictang000:make_rewards_optional_float

erictang000 commented Sep 16, 2025 •

edited

Loading

Uh oh!

CharlieFRuan commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

CharlieFRuan commented Sep 16, 2025

Uh oh!

erictang000 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- reward: Optional[Float] reward for the action, None if intermediate steps have no reward
	- reward: Optional[float] reward for the action, None if intermediate steps have no reward

Comments

Conversation

erictang000 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

CharlieFRuan commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan commented Sep 16, 2025

Uh oh!

erictang000 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Sep 16, 2025 •

edited

Loading