You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In relevant experiments, we tested the reliability of these models as Judgers. Among them, CJ-1-14B and 32B achieved a Judge correlation of over 95% with GPT4o-0806 on the WildBench dataset, which can be considered as a low-cost alternative to GPT4o. For more details, please refer to our paper. https://arxiv.org/pdf/2410.16256
Thank you!
The text was updated successfully, but these errors were encountered:
Hi WildBench Team 👋,
We have updated a series of Fine-Tuned Judge Models:
opencompass/CompassJudger-1-32B-Instruct
opencompass/CompassJudger-1-14B-Instruct
opencompass/CompassJudger-1-7B-Instruct
opencompass/CompassJudger-1-1.5B-Instruct
In relevant experiments, we tested the reliability of these models as Judgers. Among them, CJ-1-14B and 32B achieved a Judge correlation of over 95% with GPT4o-0806 on the WildBench dataset, which can be considered as a low-cost alternative to GPT4o. For more details, please refer to our paper. https://arxiv.org/pdf/2410.16256
Thank you!
The text was updated successfully, but these errors were encountered: