-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🤝 Mixture of judges #2159
🤝 Mixture of judges #2159
Conversation
|
Thanks a lot @gaetanlop. Added some suggestion and open questions |
Also, please make sure to run the pre-commits ( |
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Hey @qgallouedec, thanks for the review. I have added a For the naming of the judges, let's do |
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
…/trl into cgpo_mixture_of_judges
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LGTM, thanks @gaetanlop |
We're having the issue
that is mainly due to the fact that PairRM is requested for download simultaneously for different tests. This happens quite randomly, and is not related to any actual bug in the code base. I'll ignore it and merge. |
What does this PR do?
This PR adds the Mixture of judges part of the CGPO paper (https://arxiv.org/pdf/2409.20370). The judges are described in section 4.1.4 and the mixture of judges simply labels a generation as “violated” (0) if it fails any one of the constraint judgments and “satisfied” (1) otherwise.
Related to #2156
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
@kashif @lewtun