Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize benchmarks no only include task objects and added getter for benchmarks #1208

Merged
merged 6 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ pip install mteb

## Usage

* Using a python script (see [scripts/run_mteb_english.py](https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_english.py) and [mteb/mtebscripts](https://github.com/embeddings-benchmark/mtebscripts) for more):
* Using a python script:

```python
import mteb
Expand Down Expand Up @@ -77,11 +77,11 @@ Click on each section below to see the details.
<br />

<details>
<summary> Dataset selection </summary>
<summary> Task selection </summary>

### Dataset selection
### Task selection

Datasets can be selected by providing the list of datasets, but also
Tasks can be selected by providing the list of datasets, but also

* by their task (e.g. "Clustering" or "Classification")

Expand Down Expand Up @@ -121,11 +121,18 @@ evaluation = mteb.MTEB(tasks=[
# for an example of a HF subset see "Subset" in the dataset viewer at: https://huggingface.co/datasets/mteb/bucc-bitext-mining
```

There are also presets available for certain task collections, e.g. to select the 56 English datasets that form the "Overall MTEB English leaderboard":
</details>

<details>
<summary> Running a benchmark </summary>

`mteb` comes with a set of predefined benchmarks. These can be fetched using `get_benchmark` and run in a similar fashion to other sets of tasks.
For instance to select the 56 English datasets that form the "Overall MTEB English leaderboard":

```python
from mteb import MTEB_MAIN_EN
evaluation = mteb.MTEB(tasks=MTEB_MAIN_EN, task_langs=["en"])
import mteb
mteb_eng = mteb.get_benchmark("MTEB(eng)")
KennethEnevoldsen marked this conversation as resolved.
Show resolved Hide resolved
evaluation = mteb.MTEB(tasks=mteb_eng, eval_splits=["test"])
```

</details>
Expand Down
6 changes: 4 additions & 2 deletions mteb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from importlib.metadata import version

from mteb.benchmarks import (
from mteb.benchmarks.benchmarks import (
MTEB_MAIN_EN,
MTEB_MAIN_RU,
MTEB_RETRIEVAL_LAW,
Expand All @@ -14,7 +14,8 @@
from mteb.models import get_model, get_model_meta
from mteb.overview import TASKS_REGISTRY, get_task, get_tasks

from .benchmarks import Benchmark
from .benchmarks.benchmarks import Benchmark
from .benchmarks.get_benchmark import get_benchmark

__version__ = version("mteb") # fetch version from install metadata

Expand All @@ -32,4 +33,5 @@
"get_model_meta",
"load_results",
"Benchmark",
"get_benchmark",
]
3 changes: 3 additions & 0 deletions mteb/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from __future__ import annotations

from mteb.benchmarks.benchmarks import *
Loading
Loading