Skip to content

[Clarification request] Per-turn rewards in SKyRLGymGenerator agent_loop #201

@alex-dr

Description

@alex-dr

I'm interested in multi-turn environments with turn-level rewards in the SkyRLGymGenerator.

From my reading of the code, only the final reward from the trajectory is used: https://github.com/NovaSky-AI/SkyRL/blob/main/skyrl-train/skyrl_train/generators/skyrl_gym_generator.py#L187 - the reward variable is updated with the most recent reward at each step, and the final value returned at the end of the method.

However, loss functions like GSPO explicitly support token-level advantages.

How am I supposed to give varying token-level rewards when only the final reward is ever returned, or is this not supported?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions