-
Notifications
You must be signed in to change notification settings - Fork 255
Closed
Description
This issue tracks progress towards implementing support for the DAPO recipe and launching a reproduction run.
- Support token level loss: Support token-level loss, make default #90
- Support clip higher
- Support overlong filtering [DAPO] Add support for overlong filtering #111
- Support dynamic sampling [trainer/algorithm] Implement DAPO and Polaris style dynamic sampling + add DAPO docs + example #130
- Polaris-style
- DAPO-style
- Customize
compute_advmore flexibly with registry [Trainer] Support registering custom advantage estimators #115
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels