TruthfulQA Example with Jury #104

marko-polo-cheno · 2024-10-11T16:24:07Z

This example highlights the advantage of using a jury of judges, pulling from more than one knowledge base to critique answers on the TruthfulQA dataset.

codecov-commenter · 2024-10-11T16:29:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.92%. Comparing base (539c485) to head (8166fce).
Report is 2 commits behind head on trunk.

Additional details and impacted files

@@           Coverage Diff           @@
##            trunk     #104   +/-   ##
=======================================
  Coverage   96.92%   96.92%           
=======================================
  Files          34       34           
  Lines        1465     1465           
=======================================
  Hits         1420     1420           
  Misses         45       45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

marko-polo-cheno · 2024-10-11T17:21:22Z

^^ When a Cohere API key is provided in the .env, and we take df = df.head(2).

gordonhart · 2024-10-14T13:47:02Z

examples/assets/truthfulQA_leaderboard.jpg

Any ideas as to why gpt-4o is so low and o1-mini is champion?

The combination of judges and basic system prompt likely made judges have style-preferences, but this has been updated!

marko-polo-cheno added 5 commits October 4, 2024 11:48

setup API for generating model responses

622afd6

Merge branch 'trunk' into mc/truthfulQA_example

388ccb6

update images

7050e7f

clean up download and generation script

49abff6

update example notebook

5adcf7f

fix downloaded file names

86fd8ad

marko-polo-cheno requested a review from gordonhart October 11, 2024 17:29

nits

d363a9e

gordonhart reviewed Oct 14, 2024

View reviewed changes

update prompt and images

8166fce

marko-polo-cheno requested a review from gordonhart October 15, 2024 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TruthfulQA Example with Jury #104

TruthfulQA Example with Jury #104

marko-polo-cheno commented Oct 11, 2024

codecov-commenter commented Oct 11, 2024 •

edited

Loading

marko-polo-cheno commented Oct 11, 2024 •

edited

Loading

gordonhart Oct 14, 2024

marko-polo-cheno Oct 15, 2024 •

edited

Loading

TruthfulQA Example with Jury #104

Are you sure you want to change the base?

TruthfulQA Example with Jury #104

Conversation

marko-polo-cheno commented Oct 11, 2024

codecov-commenter commented Oct 11, 2024 • edited Loading

Codecov Report

marko-polo-cheno commented Oct 11, 2024 • edited Loading

gordonhart Oct 14, 2024

Choose a reason for hiding this comment

marko-polo-cheno Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Oct 11, 2024 •

edited

Loading

marko-polo-cheno commented Oct 11, 2024 •

edited

Loading

marko-polo-cheno Oct 15, 2024 •

edited

Loading