Final changes to support the new LLM as Judges #1469

yoavkatz · 2025-01-05T08:37:06Z

This issue documents what's still needs to be done:

Change the main LLM documentation . Make the new way the recommended away. Keep the old way only at the end highlighting this is the way to create a judge when you need full control of the input to the model.
Add all new examples to the examples documentation page. Remove most legacy examples, except one or two .
Finalize the "main" score of the the new llm as judge metrics.
For direct: "llm_as_judge_score" - The mean score of all the instance judge scores.
For pairwise: "1_win_rate" - The mean win_rate of the first system

yoavkatz assigned martinscooper, elronbandel and yoavkatz Jan 5, 2025

Provide feedback