Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
project		project
.DS_Store		.DS_Store
README.md		README.md
automatic_benchmarks.md		automatic_benchmarks.md
custom_evaluation.md		custom_evaluation.md

README.md

Evaluation

This module covers evaluation approaches for your smol model, including both standard benchmarks and domain-specific evaluation methods.

In this module we will use the library lighteval. It's made at Hugging Face and it's integrated with the Hugging Face ecosystem. If you want to go deeper into the topic of evaluation with the authors of lighteval, you can check the evaluation guidebook.

Module Overview

Evaluating language models focuses on assessing core capabilities:

Task Performance: How well the model performs on specific tasks like question answering, summarization, etc.
Output Quality: Measuring factors like coherence, relevance, and factual accuracy
Safety & Bias: Checking for harmful outputs, biases, and toxic content
Domain Expertise: Testing specialized knowledge and capabilities in specific fields

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4_evaluation

4_evaluation

README.md

Evaluation

Module Overview

Contents

Automatic Benchmarks

Custom Domain Evaluation

Domain Evaluation Project

Resources

Files

4_evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

4_evaluation

Folders and files

parent directory

README.md

Evaluation

Module Overview

Contents

Automatic Benchmarks

Custom Domain Evaluation

Domain Evaluation Project

Resources