Skip to content

Conversation

@luciaquirke
Copy link
Collaborator

@luciaquirke luciaquirke commented Sep 5, 2025

Add utility flag to use the policy gradient loss when rewards are available

@luciaquirke luciaquirke force-pushed the grpo branch 10 times, most recently from fed75d9 to 8a001b1 Compare September 5, 2025 04:12
@luciaquirke luciaquirke force-pushed the grpo branch 3 times, most recently from b4dfc5f to 79214e9 Compare September 5, 2025 04:48
Copy link
Member

@norabelrose norabelrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luciaquirke luciaquirke merged commit 51385a3 into main Sep 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants