Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish Evaluation Module #42

Open
4 tasks
burtenshaw opened this issue Dec 5, 2024 · 1 comment
Open
4 tasks

Finish Evaluation Module #42

burtenshaw opened this issue Dec 5, 2024 · 1 comment

Comments

@burtenshaw
Copy link
Collaborator

burtenshaw commented Dec 5, 2024

The evaluation module is not complete. It requires a finalised structure, some more informations, and exercises.

Structure

Here is a basic proposal for a structure:

  • what's eval
  • here are the well known benchmarks, limitations, and some alternatives people set up (arenas/llm judges)
  • you should do your own evals for your own use case
  • project on domain specific evaluation
  • notebook on comparing models

Comments

  • add a small mention of human based elo rankings and llm as judges

  • notebook for implementing a custom eval (you'll find one in the eval guidebook (could make sense to point towards it for further analysis/knowledge)

  • Refactor to basic structure and add TODOs

  • Add all information and references from the evaluation guidebook

  • Update projects

  • Update notebook with exercises

@burtenshaw
Copy link
Collaborator Author

The notebook will need to be synced with smol course dependencies.

@sylvain471 mentioned dependency issues in the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant