-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TruthfulQA Example with Jury #104
base: trunk
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## trunk #104 +/- ##
=======================================
Coverage 96.92% 96.92%
=======================================
Files 34 34
Lines 1465 1465
=======================================
Hits 1420 1420
Misses 45 45 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas as to why gpt-4o
is so low and o1-mini
is champion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combination of judges and basic system prompt likely made judges have style-preferences, but this has been updated!
This example highlights the advantage of using a jury of judges, pulling from more than one knowledge base to critique answers on the TruthfulQA dataset.