Support token-level loss, make default by tyler-griggs · Pull Request #90 · NovaSky-AI/SkyRL

tyler-griggs · 2025-07-15T01:56:18Z

What does this PR do?

Adds support for token-level loss (ie, token_mean loss reduction type) as introduced by DAPO.

With token_mean loss reduction, all tokens in all sequences contribute equally to loss.

The loss reduction type is configurable via trainer.algorithm.loss_reduction, but the default is updated to be token_mean, as opposed to our previous implementation (sequence_mean). This loss reduction is what the community is standardizing on as default (TRL's default, verl's default)

Wandb report of comparing token_mean vs sequence_mean: https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k/reports/Token-level-loss-token_mean---VmlldzoxMzYwMDc4MQ

The only plot with a notable difference is policy_loss, which is much larger for token_mean than it is for sequence_mean:

However, this policy_loss matches the same magnitude of pg_loss we observe in verl:

SumanthRH

Great!

Left a couple nits

skyrl-train/tests/cpu/algorithms/test_losses.py

SumanthRH · 2025-07-15T03:18:29Z

skyrl-train/tests/cpu/algorithms/test_losses.py

+    loss_mask = torch.tensor([[1.0, 1.0, 1.0], [1.0, 0.0, 0.0]], device=device)
+
+    # Test token_mean without mask
+    loss_fn_token = PolicyLoss(loss_type="regular", loss_reduction="token_mean")


nit: you should explicitly pass in the eps low and eps high values here to make the test less brittle

SumanthRH · 2025-07-15T03:21:39Z

Would be nice to add a screenshot for convergence on gms8k (and how it changes from before) before merging

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

into tgriggs/token-loss

tyler-griggs · 2025-07-15T18:03:55Z

Added details form gsm8k run in initial PR description.

## What does this PR do? Adds support for token-level loss (ie, `token_mean` loss reduction type) as introduced by DAPO. With `token_mean` loss reduction, all tokens in all sequences contribute equally to loss. The loss reduction type is configurable via `trainer.algorithm.loss_reduction`, but the default is updated to be `token_mean`, as opposed to our previous implementation (`sequence_mean`). This loss reduction is what the community is standardizing on as default (TRL's [default](huggingface/trl#2881), verl's [default](https://github.com/volcengine/verl/blob/517cc23c9dbb0da5c2cd2b012466790e29cb781a/verl/trainer/config/actor/actor.yaml#L63)) Wandb report of comparing `token_mean` vs `sequence_mean`: https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k/reports/Token-level-loss-token_mean---VmlldzoxMzYwMDc4MQ The only plot with a notable difference is `policy_loss`, which is much larger for `token_mean` than it is for `sequence_mean`: <img width="312" height="274" alt="Screenshot 2025-07-15 at 9 52 57 AM" src="https://github.com/user-attachments/assets/40f94cb6-c5e5-47f6-9b09-a076811746a0" /> However, this `policy_loss` matches the same magnitude of `pg_loss` we observe in verl: <img width="980" height="611" alt="Screenshot 2025-07-15 at 9 54 39 AM" src="https://github.com/user-attachments/assets/53714573-2b21-4e67-b30a-dd3648279438" /> --------- Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

init commit

2d70b1e

tyler-griggs force-pushed the tgriggs/token-loss branch from a4e5745 to 2d70b1e Compare July 15, 2025 02:00

tyler-griggs added 2 commits July 15, 2025 02:30

add testing, update names

a066dd7

fix

280a7c7

tyler-griggs changed the title ~~Support token-level loss~~ Support token-level loss, make default Jul 15, 2025

SumanthRH self-assigned this Jul 15, 2025

SumanthRH approved these changes Jul 15, 2025

View reviewed changes

tyler-griggs and others added 3 commits July 14, 2025 22:06

Update skyrl-train/tests/cpu/algorithms/test_losses.py

607ad9d

Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>

fix

09ef2ac

Merge branch 'tgriggs/token-loss' of https://github.com/NovaSky-AI/SkyRL

daaaa33

into tgriggs/token-loss

tyler-griggs mentioned this pull request Jul 15, 2025

Add DAPO Recipe #88

Closed

7 tasks

x

4a80e63

tyler-griggs merged commit fc59170 into main Jul 15, 2025
3 checks passed

SumanthRH deleted the tgriggs/token-loss branch July 16, 2025 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support token-level loss, make default#90

Support token-level loss, make default#90
tyler-griggs merged 7 commits intomainfrom
tgriggs/token-loss

tyler-griggs commented Jul 15, 2025 •

edited

Loading

Uh oh!

SumanthRH left a comment

Uh oh!

Uh oh!

SumanthRH Jul 15, 2025

Uh oh!

SumanthRH commented Jul 15, 2025

Uh oh!

tyler-griggs commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tyler-griggs commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SumanthRH Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Jul 15, 2025

Uh oh!

tyler-griggs commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tyler-griggs commented Jul 15, 2025 •

edited

Loading