Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Replacement of Open-source JudgeModel/Evaluator #19

Open
bittersweet1999 opened this issue Oct 25, 2024 · 0 comments
Open

The Replacement of Open-source JudgeModel/Evaluator #19

bittersweet1999 opened this issue Oct 25, 2024 · 0 comments

Comments

@bittersweet1999
Copy link

Hi WildBench Team 👋,

We have updated a series of Fine-Tuned Judge Models:
opencompass/CompassJudger-1-32B-Instruct
opencompass/CompassJudger-1-14B-Instruct
opencompass/CompassJudger-1-7B-Instruct
opencompass/CompassJudger-1-1.5B-Instruct

In relevant experiments, we tested the reliability of these models as Judgers. Among them, CJ-1-14B and 32B achieved a Judge correlation of over 95% with GPT4o-0806 on the WildBench dataset, which can be considered as a low-cost alternative to GPT4o. For more details, please refer to our paper. https://arxiv.org/pdf/2410.16256

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant