Finish Evaluation Module #42

burtenshaw · 2024-12-05T10:56:10Z

The evaluation module is not complete. It requires a finalised structure, some more informations, and exercises.

Here is a basic proposal for a structure:

what's eval
here are the well known benchmarks, limitations, and some alternatives people set up (arenas/llm judges)
you should do your own evals for your own use case
project on domain specific evaluation
notebook on comparing models

add a small mention of human based elo rankings and llm as judges
notebook for implementing a custom eval (you'll find one in the eval guidebook (could make sense to point towards it for further analysis/knowledge)
Refactor to basic structure and add TODOs
Add all information and references from the evaluation guidebook
Update projects
Update notebook with exercises

burtenshaw · 2024-12-06T12:48:41Z

The notebook will need to be synced with smol course dependencies.

@sylvain471 mentioned dependency issues in the notebook.

burtenshaw mentioned this issue Dec 6, 2024

lighteval notebook requirements #55

Closed

burtenshaw mentioned this issue Dec 9, 2024

[MODULE] Evaluation improvements #71

Merged

Provide feedback