Skip to content

Commit 1447a76

Browse files
Samoedgithub-actionsayush1298KennethEnevoldsenisaac-chung
authored
[v2] Merge main (#2324)
* 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint * fix license --------- Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 53b830b commit 1447a76

File tree

1,510 files changed

+115906
-112267
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,510 files changed

+115906
-112267
lines changed

.github/pull_request_template.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,12 @@
2929
- [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
3030
- [ ] If the dataset is too big (e.g. >2048 examples), considering using `self.stratified_subsampling() under dataset_transform()`
3131
- [ ] I have filled out the metadata object in the dataset file (find documentation on it [here](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#2-creating-the-metadata-object)).
32-
- [ ] Run tests locally to make sure nothing is broken using `make test`.
33-
- [ ] Run the formatter to format the code using `make lint`.
32+
- [ ] Run tests locally to make sure nothing is broken using `make test`.
33+
- [ ] Run the formatter to format the code using `make lint`.
3434

3535

3636
### Adding a model checklist
37-
<!--
37+
<!--
3838
When adding a model to the model registry
3939
see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md
4040
-->
@@ -43,4 +43,4 @@ see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducibl
4343
- [ ] I have ensured that my model can be loaded using
4444
- [ ] `mteb.get_model(model_name, revision)` and
4545
- [ ] `mteb.get_model_meta(model_name, revision)`
46-
- [ ] I have tested the implementation works on a representative set of tasks.
46+
- [ ] I have tested the implementation works on a representative set of tasks.

.github/workflows/docs.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ jobs:
4747
- name: Create table
4848
run: |
4949
make build-docs
50-
50+
5151
- name: Push table
5252
run: |
5353
git config --global user.email "github-actions[bot]@users.noreply.github.com"
@@ -60,4 +60,3 @@ jobs:
6060
git commit -m "Update tasks table"
6161
git push
6262
fi
63-

.github/workflows/documentation.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
- uses: actions/setup-python@v4
2020
with:
2121
python-version: '3.10'
22-
22+
2323
- name: Dependencies
2424
run: |
2525
python -m pip install --upgrade pip
@@ -29,7 +29,7 @@ jobs:
2929
- name: Build and Deploy
3030
if: github.event_name == 'push'
3131
run: mkdocs gh-deploy --force
32-
32+
3333
- name: Build
3434
if: github.event_name == 'pull_request'
35-
run: make build-docs
35+
run: make build-docs

.github/workflows/leaderboard_refresh.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ name: Daily Space Rebuild
22
on:
33
schedule:
44
# Runs at midnight Pacific Time (8 AM UTC)
5-
- cron: '0 8 * * *'
6-
workflow_dispatch: # Allows manual triggering
5+
- cron: "0 8 * * *"
6+
workflow_dispatch: # Allows manual triggering
77

88
jobs:
99
rebuild:

.github/workflows/lint.yml

-1
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,3 @@ jobs:
2525
id: lint
2626
run: |
2727
make lint-check
28-

.github/workflows/model_loading.yml

+11-11
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,22 @@ name: Model Loading
33
on:
44
pull_request:
55
paths:
6-
- 'mteb/models/**.py'
6+
- "mteb/models/**.py"
77

88
jobs:
99
extract-and-run:
1010
runs-on: ubuntu-latest
1111

1212
steps:
13-
- name: Checkout repository
14-
uses: actions/checkout@v3
13+
- name: Checkout repository
14+
uses: actions/checkout@v3
1515

16-
- name: Set up Python
17-
uses: actions/setup-python@v4
18-
with:
19-
python-version: '3.10'
20-
cache: 'pip'
16+
- name: Set up Python
17+
uses: actions/setup-python@v4
18+
with:
19+
python-version: "3.10"
20+
cache: "pip"
2121

22-
- name: Install dependencies and run tests
23-
run: |
24-
make model-load-test BASE_BRANCH=${{ github.event.pull_request.base.ref }}
22+
- name: Install dependencies and run tests
23+
run: |
24+
make model-load-test BASE_BRANCH=${{ github.event.pull_request.base.ref }}

.github/workflows/release.yml

+3-4
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@ jobs:
2020
runs-on: ubuntu-latest
2121
concurrency: release
2222
permissions:
23-
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing using PyPI
24-
23+
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing using PyPI
2524

2625
if: ${{ github.ref == 'refs/heads/main' && github.event.workflow_run.conclusion == 'success'}}
2726
steps:
@@ -40,8 +39,8 @@ jobs:
4039
- name: Publish package distributions to PyPI
4140
uses: pypa/gh-action-pypi-publish@release/v1
4241
if: steps.release.outputs.released == 'true'
43-
# This action supports PyPI's trusted publishing implementation, which allows authentication to PyPI without a manually
44-
# configured API token or username/password combination. To perform trusted publishing with this action, your project's
42+
# This action supports PyPI's trusted publishing implementation, which allows authentication to PyPI without a manually
43+
# configured API token or username/password combination. To perform trusted publishing with this action, your project's
4544
# publisher must already be configured on PyPI.
4645

4746
- name: Publish package distributions to GitHub Releases

.github/workflows/test.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
# 1) install Python dependencies
33
# 2) run make test
44

5-
65
name: Test
76
on:
87
push:
@@ -30,7 +29,7 @@ jobs:
3029
with:
3130
python-version: ${{ matrix.python-version }}
3231
cache: "pip"
33-
32+
3433
- name: Install dependencies
3534
shell: bash
3635
run: |
@@ -53,4 +52,3 @@ jobs:
5352
# if it fails again, the workflow will fail.
5453
# If it passes the first time the test will not run again
5554
make test || make test
56-

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,4 @@ model_names.txt
151151
mteb/leaderboard/__cached_results.json
152152

153153
# gradio
154-
.gradio/
154+
.gradio/

.pre-commit-config.yaml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
fail_fast: true
2+
3+
repos:
4+
- repo: https://github.com/abravalheri/validate-pyproject
5+
rev: v0.23
6+
hooks:
7+
- id: validate-pyproject
8+
9+
- repo: https://github.com/pre-commit/pre-commit-hooks
10+
rev: v2.3.0
11+
hooks:
12+
- id: check-yaml
13+
- id: check-json
14+
- id: pretty-format-json
15+
args:
16+
- "--autofix"
17+
- "--indent=4"
18+
- "--no-sort-keys"
19+
- id: end-of-file-fixer # generated a lot of changes
20+
- id: trailing-whitespace
21+
- id: check-toml
22+
23+
- repo: local
24+
hooks:
25+
- id: lint
26+
name: lint
27+
description: "Run 'make lint'"
28+
entry: make lint
29+
language: python
30+
types_or: [python]
31+
minimum_pre_commit_version: "2.9.2"

.vscode/extensions.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
"recommendations": [
33
"charliermarsh.ruff"
44
]
5-
}
5+
}

.vscode/settings.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@
44
],
55
"python.testing.unittestEnabled": false,
66
"python.testing.pytestEnabled": true,
7-
"editor.defaultFormatter": "charliermarsh.ruff",
7+
"editor.defaultFormatter": "charliermarsh.ruff"
88
}

CONTRIBUTING.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Contributing to mteb
1+
## Contributing to MTEB
22

33
We welcome contributions to `mteb` such as new tasks, code optimization or benchmarks.
44

@@ -18,7 +18,7 @@ cd mteb
1818
make install
1919
```
2020

21-
This uses [make](https://www.gnu.org/software/make/) to define the install command. You can see what each command does in the [makefile](https://github.com/embeddings-benchmark/mteb/blob/main/Makefile).
21+
This uses [make](https://www.gnu.org/software/make/) to define the install command. You can see what each command does in the [makefile](https://github.com/embeddings-benchmark/mteb/blob/main/Makefile).
2222

2323
### Running Tests
2424

@@ -52,4 +52,4 @@ This command is equivalent to the command run during CI. It will check for code
5252

5353
Any commit with one of these prefixes will trigger a version bump upon merging to the main branch as long as tests pass. A version bump will then trigger a new release on PyPI as well as a new release on GitHub.
5454

55-
Other prefixes will not trigger a version bump. For example, `docs:`, `chore:`, `refactor:`, etc., however they will structure the commit history and the changelog. You can find more information about this in the [python-semantic-release documentation](https://python-semantic-release.readthedocs.io/en/latest/). If you do not intend to trigger a version bump you're not required to follow this convention when contributing to `mteb`.
55+
Other prefixes will not trigger a version bump. For example, `docs:`, `chore:`, `refactor:`, etc., however they will structure the commit history and the changelog. You can find more information about this in the [python-semantic-release documentation](https://python-semantic-release.readthedocs.io/en/latest/). If you do not intend to trigger a version bump you're not required to follow this convention when contributing to MTEB.

Makefile

+11-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
install:
22
@echo "--- 🚀 Installing project dependencies ---"
33
pip install -e ".[dev,docs]"
4+
pre-commit install
45

56
install-for-tests:
67
@echo "--- 🚀 Installing project dependencies for test ---"
@@ -10,7 +11,7 @@ install-for-tests:
1011
lint:
1112
@echo "--- 🧹 Running linters ---"
1213
ruff format . # running ruff formatting
13-
ruff check . --fix # running ruff linting
14+
ruff check . --fix --exit-non-zero-on-fix # running ruff linting # --exit-non-zero-on-fix is used for the pre-commit hook to work
1415

1516
lint-check:
1617
@echo "--- 🧹 Check is project is linted ---"
@@ -22,9 +23,10 @@ test:
2223
@echo "--- 🧪 Running tests ---"
2324
pytest -n auto -m "not test_datasets"
2425

26+
2527
test-with-coverage:
2628
@echo "--- 🧪 Running tests with coverage ---"
27-
pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb
29+
pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb
2830

2931
pr:
3032
@echo "--- 🚀 Running requirements for a PR ---"
@@ -56,4 +58,10 @@ dataset-load-test:
5658

5759
run-leaderboard:
5860
@echo "--- 🚀 Running leaderboard locally ---"
59-
python -m mteb.leaderboard.app
61+
python -m mteb.leaderboard.app
62+
63+
64+
.PHONY: check
65+
check: ## Run code quality tools.
66+
@echo "--- 🧹 Running code quality tools ---"
67+
@pre-commit run -a

README.md

+32-25
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ evaluation = mteb.MTEB(tasks=tasks)
6767
```
6868

6969
In prompts the key can be:
70-
1. Prompt types (`passage`, `query`) - they will be used in reranking and retrieval tasks
70+
1. Prompt types (`passage`, `query`) - they will be used in reranking and retrieval tasks
7171
2. Task type - these prompts will be used in all tasks of the given type
7272
1. `BitextMining`
7373
2. `Classification`
@@ -104,7 +104,7 @@ mteb run -m sentence-transformers/all-MiniLM-L6-v2 \
104104
## Usage Documentation
105105
Click on each section below to see the details.
106106

107-
<br />
107+
<br />
108108

109109
<details>
110110
<summary> Task selection </summary>
@@ -151,14 +151,29 @@ evaluation = mteb.MTEB(tasks=[
151151
# for an example of a HF subset see "Subset" in the dataset viewer at: https://huggingface.co/datasets/mteb/bucc-bitext-mining
152152
```
153153

154+
* by their modalities
155+
156+
```python
157+
tasks = mteb.get_tasks(modalities=["text", "image"]) # Only select tasks with text or image modalities
158+
```
159+
160+
You can also specify exclusive modality filtering to only get tasks with exactly the requested modalities (default behavior with exclusive_modality_filter=False):
161+
```python
162+
# Get tasks with text modality, this will also include tasks having both text and image modalities
163+
tasks = mteb.get_tasks(modalities=["text"], exclusive_modality_filter=False)
164+
165+
# Get tasks that have ONLY text modality (no image or other modalities)
166+
tasks = mteb.get_tasks(modalities=["text"], exclusive_modality_filter=True)
167+
```
168+
154169
</details>
155170

156171
<details>
157172
<summary> Running a benchmark </summary>
158173

159174
### Running a Benchmark
160175

161-
`mteb` comes with a set of predefined benchmarks. These can be fetched using `get_benchmark` and run in a similar fashion to other sets of tasks.
176+
`mteb` comes with a set of predefined benchmarks. These can be fetched using `get_benchmark` and run in a similar fashion to other sets of tasks.
162177
For instance to select the 56 English datasets that form the "Overall MTEB English leaderboard":
163178

164179
```python
@@ -248,13 +263,13 @@ class CustomModel:
248263
**kwargs,
249264
) -> np.ndarray:
250265
"""Encodes the given sentences using the encoder.
251-
266+
252267
Args:
253268
sentences: The sentences to encode.
254269
task_name: The name of the task.
255270
prompt_type: The prompt type to use.
256271
**kwargs: Additional arguments to pass to the encoder.
257-
272+
258273
Returns:
259274
The encoded sentences.
260275
"""
@@ -298,7 +313,7 @@ evaluation.run(model)
298313

299314
### Using a cross encoder for reranking
300315

301-
To use a cross encoder for reranking, you can directly use a CrossEncoder from SentenceTransformers. The following code shows a two-stage run with the second stage reading results saved from the first stage.
316+
To use a cross encoder for reranking, you can directly use a CrossEncoder from SentenceTransformers. The following code shows a two-stage run with the second stage reading results saved from the first stage.
302317

303318
```python
304319
from mteb import MTEB
@@ -454,7 +469,7 @@ model_w_contamination = ModelMeta(
454469
### Running the Leaderboard
455470

456471
It is possible to completely deploy the leaderboard locally or self-host it. This can e.g. be relevant for companies that might want to
457-
integrate build their own benchmarks or integrate custom tasks into existing benchmarks.
472+
integrate build their own benchmarks or integrate custom tasks into existing benchmarks.
458473

459474
Running the leaderboard is quite easy. Simply run:
460475
```py
@@ -480,12 +495,12 @@ There are times you may want to cache the embeddings so you can re-use them. Thi
480495
from mteb.models.cache_wrapper import CachedEmbeddingWrapper
481496
model_with_cached_emb = CachedEmbeddingWrapper(model, cache_path='path_to_cache_dir')
482497
# run as normal
483-
evaluation.run(model, ...)
498+
evaluation.run(model, ...)
484499
```
485500

486501
</details>
487502

488-
<br />
503+
<br />
489504

490505

491506

@@ -520,22 +535,14 @@ evaluation.run(model, ...)
520535
MTEB was introduced in "[MTEB: Massive Text Embedding Benchmark](https://aclanthology.org/2023.eacl-main.148/)", feel free to cite:
521536

522537
```bibtex
523-
@inproceedings{muennighoff-etal-2023-mteb,
524-
title = "{MTEB}: Massive Text Embedding Benchmark",
525-
author = "Muennighoff, Niklas and
526-
Tazi, Nouamane and
527-
Magne, Loic and
528-
Reimers, Nils",
529-
editor = "Vlachos, Andreas and
530-
Augenstein, Isabelle",
531-
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
532-
month = may,
533-
year = "2023",
534-
address = "Dubrovnik, Croatia",
535-
publisher = "Association for Computational Linguistics",
536-
url = "https://aclanthology.org/2023.eacl-main.148",
537-
doi = "10.18653/v1/2023.eacl-main.148",
538-
pages = "2014--2037",
538+
@article{muennighoff2022mteb,
539+
doi = {10.48550/ARXIV.2210.07316},
540+
url = {https://arxiv.org/abs/2210.07316},
541+
author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\i}c and Reimers, Nils},
542+
title = {MTEB: Massive Text Embedding Benchmark},
543+
publisher = {arXiv},
544+
journal={arXiv preprint arXiv:2210.07316},
545+
year = {2022}
539546
}
540547
```
541548

0 commit comments

Comments
 (0)