Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add listing all available benchmarks CLI option #1256

Merged
merged 4 commits into from
Sep 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,7 @@ df = results_to_dataframe(results)
| Documentation | |
| ------------------------------ | ---------------------- |
| 📋 [Tasks] | Overview of available tasks |
| 📐 [Benchmarks] | Overview of available benchmarks |
| 📈 [Leaderboard] | The interactive leaderboard of the benchmark |
| 🤖 [Adding a model] | Information related to how to submit a model to the leaderboard |
| 👩‍🔬 [Reproducible workflows] | Information related to how to reproduce and create reproducible workflows with MTEB |
Expand All @@ -387,6 +388,7 @@ df = results_to_dataframe(results)
| 🌐 [MMTEB] | An open-source effort to extend MTEB to cover a broad set of languages |  

[Tasks]: docs/tasks.md
[Benchmarks]: docs/benchmarks.md
[Contributing]: CONTRIBUTING.md
[Adding a model]: docs/adding_a_model.md
[Adding a dataset]: docs/adding_a_dataset.md
Expand Down
2 changes: 1 addition & 1 deletion docs/benchmarks.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## Available benchmarks
The following tables give you an overview of the benchmarks in MTEB.
The following table gives you an overview of the benchmarks in MTEB.

<details>

Expand Down
24 changes: 24 additions & 0 deletions mteb/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@
mteb available_tasks --task_types Clustering # list tasks of type Clustering
```

## Listing Available Benchmarks

To list the available benchmarks within MTEB, use the `mteb available_benchmarks` command. For example:

```bash
mteb available_benchmarks # list all available benchmarks
```


## Creating Model Metadata

Expand Down Expand Up @@ -144,6 +152,12 @@ def run(args: argparse.Namespace) -> None:
_save_model_metadata(model, Path(args.output_folder))


def available_benchmarks(args: argparse.Namespace) -> None:
benchmarks = mteb.get_benchmarks()
eval = mteb.MTEB(tasks=benchmarks)
eval.mteb_benchmarks()


def available_tasks(args: argparse.Namespace) -> None:
tasks = mteb.get_tasks(
categories=args.categories,
Expand Down Expand Up @@ -198,6 +212,15 @@ def add_available_tasks_parser(subparsers) -> None:
parser.set_defaults(func=available_tasks)


def add_available_benchmarks_parser(subparsers) -> None:
parser = subparsers.add_parser(
"available_benchmarks", help="List the available benchmarks within MTEB"
)
add_task_selection_args(parser)

parser.set_defaults(func=available_benchmarks)


def add_run_parser(subparsers) -> None:
parser = subparsers.add_parser("run", help="Run a model on a set of tasks")

Expand Down Expand Up @@ -321,6 +344,7 @@ def main():
)
add_run_parser(subparsers)
add_available_tasks_parser(subparsers)
add_available_benchmarks_parser(subparsers)
add_create_meta_parser(subparsers)

args = parser.parse_args()
Expand Down
6 changes: 6 additions & 0 deletions mteb/evaluation/MTEB.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,12 @@ def _display_tasks(self, task_list, name=None):
console.print(f"{prefix}{name}{category}{multilingual}")
console.print("\n")

def mteb_benchmarks(self):
"""Get all benchmarks available in the MTEB."""
for benchmark in self._tasks:
name = benchmark.name
self._display_tasks(benchmark.tasks, name=name)
Comment on lines +171 to +175
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm on a second look. We might want to move the "display" code away from the MTEB object

(optional though)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sure! Let's discuss more on that on a separate issue?


@classmethod
def mteb_tasks(cls):
"""Get all tasks available in the MTEB."""
Expand Down
9 changes: 9 additions & 0 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ def test_available_tasks():
), "Sample task Banking77Classification task not found in available tasks"


def test_available_benchmarks():
command = f"{sys.executable} -m mteb available_benchmarks"
result = subprocess.run(command, shell=True, capture_output=True, text=True)
assert result.returncode == 0, "Command failed"
assert (
"MTEB(eng)" in result.stdout
), "Sample benchmark MTEB(eng) task not found in available bencmarks"


run_task_fixures = [
(
"average_word_embeddings_komninos",
Expand Down
Loading