embeddings-benchmark
diff --git a/‎.github/pull_request_template.md
+4-4 b/‎.github/pull_request_template.md
+4-4
diff --git a/‎.github/workflows/docs.yml
+1-2 b/‎.github/workflows/docs.yml
+1-2
diff --git a/‎.github/workflows/documentation.yml
+3-3 b/‎.github/workflows/documentation.yml
+3-3
diff --git a/‎.github/workflows/leaderboard_refresh.yaml
+2-2 b/‎.github/workflows/leaderboard_refresh.yaml
+2-2
diff --git a/‎.github/workflows/lint.yml
-1 b/‎.github/workflows/lint.yml
-1
diff --git a/‎.github/workflows/model_loading.yml
+11-11 b/‎.github/workflows/model_loading.yml
+11-11
diff --git a/‎.github/workflows/release.yml
+3-4 b/‎.github/workflows/release.yml
+3-4
diff --git a/‎.github/workflows/test.yml
+1-3 b/‎.github/workflows/test.yml
+1-3
diff --git a/‎.gitignore
+1-1 b/‎.gitignore
+1-1
diff --git a/‎.pre-commit-config.yaml
+31 b/‎.pre-commit-config.yaml
+31
diff --git a/‎.vscode/extensions.json
+1-1 b/‎.vscode/extensions.json
+1-1
diff --git a/‎.vscode/settings.json
+1-1 b/‎.vscode/settings.json
+1-1
diff --git a/‎CONTRIBUTING.md
+3-3 b/‎CONTRIBUTING.md
+3-3
diff --git a/‎Makefile
+11-3 b/‎Makefile
+11-3
diff --git a/‎README.md
+32-25 b/‎README.md
+32-25
@@ -29,12 +29,12 @@
 - [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
 - [ ] If the dataset is too big (e.g. >2048 examples), considering using `self.stratified_subsampling() under dataset_transform()`
 - [ ] I have filled out the metadata object in the dataset file (find documentation on it [here](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#2-creating-the-metadata-object)).
-- [ ] Run tests locally to make sure nothing is broken using `make test`. 
-- [ ] Run the formatter to format the code using `make lint`. 
+- [ ] Run tests locally to make sure nothing is broken using `make test`.
+- [ ] Run the formatter to format the code using `make lint`.
 
 
 ### Adding a model checklist
-<!-- 
+<!--
 When adding a model to the model registry
 see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md
 -->
@@ -43,4 +43,4 @@ see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducibl
  - [ ] I have ensured that my model can be loaded using
    - [ ] `mteb.get_model(model_name, revision)` and
    - [ ] `mteb.get_model_meta(model_name, revision)`
- - [ ] I have tested the implementation works on a representative set of tasks.
+ - [ ] I have tested the implementation works on a representative set of tasks.
@@ -47,7 +47,7 @@ jobs:
       - name: Create table
         run: |
           make build-docs
-      
+
       - name: Push table
         run: |
           git config --global user.email "github-actions[bot]@users.noreply.github.com"
@@ -60,4 +60,3 @@ jobs:
             git commit -m "Update tasks table"
             git push
           fi
-          
@@ -19,7 +19,7 @@ jobs:
       - uses: actions/setup-python@v4
         with:
           python-version: '3.10'
-      
+
       - name: Dependencies
         run: |
           python -m pip install --upgrade pip
@@ -29,7 +29,7 @@ jobs:
       - name: Build and Deploy
         if: github.event_name == 'push'
         run: mkdocs gh-deploy --force
-      
+
       - name: Build
         if: github.event_name == 'pull_request'
-        run: make build-docs
+        run: make build-docs
@@ -2,8 +2,8 @@ name: Daily Space Rebuild
 on:
   schedule:
     # Runs at midnight Pacific Time (8 AM UTC)
-    - cron: '0 8 * * *'
-  workflow_dispatch:  # Allows manual triggering
+    - cron: "0 8 * * *"
+  workflow_dispatch: # Allows manual triggering
 
 jobs:
   rebuild:
 
@@ -25,4 +25,3 @@ jobs:
         id: lint
         run: |
           make lint-check
-      
@@ -3,22 +3,22 @@ name: Model Loading
 on:
   pull_request:
     paths:
-      - 'mteb/models/**.py'
+      - "mteb/models/**.py"
 
 jobs:
   extract-and-run:
     runs-on: ubuntu-latest
 
     steps:
-    - name: Checkout repository
-      uses: actions/checkout@v3
+      - name: Checkout repository
+        uses: actions/checkout@v3
 
-    - name: Set up Python
-      uses: actions/setup-python@v4
-      with:
-        python-version: '3.10'
-        cache: 'pip'
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+          cache: "pip"
 
-    - name: Install dependencies and run tests
-      run: |
-        make model-load-test BASE_BRANCH=${{ github.event.pull_request.base.ref }}
+      - name: Install dependencies and run tests
+        run: |
+          make model-load-test BASE_BRANCH=${{ github.event.pull_request.base.ref }}
@@ -20,8 +20,7 @@ jobs:
     runs-on: ubuntu-latest
     concurrency: release
     permissions:
-      id-token: write  # IMPORTANT: this permission is mandatory for trusted publishing using PyPI 
-
+      id-token: write # IMPORTANT: this permission is mandatory for trusted publishing using PyPI
 
     if: ${{ github.ref == 'refs/heads/main' && github.event.workflow_run.conclusion == 'success'}}
     steps:
@@ -40,8 +39,8 @@ jobs:
       - name: Publish package distributions to PyPI
         uses: pypa/gh-action-pypi-publish@release/v1
         if: steps.release.outputs.released == 'true'
-        # This action supports PyPI's trusted publishing implementation, which allows authentication to PyPI without a manually 
-        # configured API token or username/password combination. To perform trusted publishing with this action, your project's 
+        # This action supports PyPI's trusted publishing implementation, which allows authentication to PyPI without a manually
+        # configured API token or username/password combination. To perform trusted publishing with this action, your project's
         # publisher must already be configured on PyPI.
 
       - name: Publish package distributions to GitHub Releases
 
@@ -2,7 +2,6 @@
 # 1) install Python dependencies
 # 2) run make test
 
-
 name: Test
 on:
   push:
@@ -30,7 +29,7 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
           cache: "pip"
-        
+
       - name: Install dependencies
         shell: bash
         run: |
@@ -53,4 +52,3 @@ jobs:
           # if it fails again, the workflow will fail.
           # If it passes the first time the test will not run again
           make test || make test
-
@@ -151,4 +151,4 @@ model_names.txt
 mteb/leaderboard/__cached_results.json
 
 # gradio
-.gradio/
+.gradio/
@@ -0,0 +1,31 @@
+fail_fast: true
+
+repos:
+  - repo: https://github.com/abravalheri/validate-pyproject
+    rev: v0.23
+    hooks:
+      - id: validate-pyproject
+
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.3.0
+    hooks:
+    -   id: check-yaml
+    -   id: check-json
+    -   id: pretty-format-json
+        args:
+          - "--autofix"
+          - "--indent=4"
+          - "--no-sort-keys"
+    -   id: end-of-file-fixer # generated a lot of changes
+    -   id: trailing-whitespace
+    -   id: check-toml
+
+  - repo: local
+    hooks:
+      - id: lint
+        name: lint
+        description: "Run 'make lint'"
+        entry: make lint
+        language: python
+        types_or: [python]
+        minimum_pre_commit_version: "2.9.2"
@@ -2,4 +2,4 @@
     "recommendations": [
         "charliermarsh.ruff"
     ]
-}
+}
@@ -4,5 +4,5 @@
     ],
     "python.testing.unittestEnabled": false,
     "python.testing.pytestEnabled": true,
-    "editor.defaultFormatter": "charliermarsh.ruff",
+    "editor.defaultFormatter": "charliermarsh.ruff"
 }
@@ -1,4 +1,4 @@
-## Contributing to mteb
+## Contributing to MTEB
 
 We welcome contributions to `mteb` such as new tasks, code optimization or benchmarks.
 
@@ -18,7 +18,7 @@ cd mteb
 make install
 ```
 
-This uses [make](https://www.gnu.org/software/make/) to define the install command. You can see what each command does in the [makefile](https://github.com/embeddings-benchmark/mteb/blob/main/Makefile).  
+This uses [make](https://www.gnu.org/software/make/) to define the install command. You can see what each command does in the [makefile](https://github.com/embeddings-benchmark/mteb/blob/main/Makefile).
 
 ### Running Tests
 
@@ -52,4 +52,4 @@ This command is equivalent to the command run during CI. It will check for code
 
 Any commit with one of these prefixes will trigger a version bump upon merging to the main branch as long as tests pass. A version bump will then trigger a new release on PyPI as well as a new release on GitHub.
 
-Other prefixes will not trigger a version bump. For example, `docs:`, `chore:`, `refactor:`, etc., however they will structure the commit history and the changelog. You can find more information about this in the [python-semantic-release documentation](https://python-semantic-release.readthedocs.io/en/latest/). If you do not intend to trigger a version bump you're not required to follow this convention when contributing to `mteb`.
+Other prefixes will not trigger a version bump. For example, `docs:`, `chore:`, `refactor:`, etc., however they will structure the commit history and the changelog. You can find more information about this in the [python-semantic-release documentation](https://python-semantic-release.readthedocs.io/en/latest/). If you do not intend to trigger a version bump you're not required to follow this convention when contributing to MTEB.
@@ -1,6 +1,7 @@
 install:
 	@echo "--- 🚀 Installing project dependencies ---"
 	pip install -e ".[dev,docs]"
+	pre-commit install
 
 install-for-tests:
 	@echo "--- 🚀 Installing project dependencies for test ---"
@@ -10,7 +11,7 @@ install-for-tests:
 lint:
 	@echo "--- 🧹 Running linters ---"
 	ruff format . 			# running ruff formatting
-	ruff check . --fix  	# running ruff linting
+	ruff check . --fix --exit-non-zero-on-fix  	# running ruff linting # --exit-non-zero-on-fix is used for the pre-commit hook to work
 
 lint-check:
 	@echo "--- 🧹 Check is project is linted ---"
@@ -22,9 +23,10 @@ test:
 	@echo "--- 🧪 Running tests ---"
 	pytest -n auto -m "not test_datasets"
 
+
 test-with-coverage:
 	@echo "--- 🧪 Running tests with coverage ---"
-	pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb 
+	pytest -n auto --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb
 
 pr:
 	@echo "--- 🚀 Running requirements for a PR ---"
@@ -56,4 +58,10 @@ dataset-load-test:
 
 run-leaderboard:
 	@echo "--- 🚀 Running leaderboard locally ---"
-	python -m mteb.leaderboard.app
+	python -m mteb.leaderboard.app
+
+
+.PHONY: check
+check: ## Run code quality tools.
+	@echo "--- 🧹 Running code quality tools ---"
+	@pre-commit run -a
@@ -67,7 +67,7 @@ evaluation = mteb.MTEB(tasks=tasks)
 ```
 
 In prompts the key can be:
-1. Prompt types (`passage`, `query`) - they will be used in reranking and retrieval tasks 
+1. Prompt types (`passage`, `query`) - they will be used in reranking and retrieval tasks
 2. Task type - these prompts will be used in all tasks of the given type
    1. `BitextMining`
    2. `Classification`
@@ -104,7 +104,7 @@ mteb run -m sentence-transformers/all-MiniLM-L6-v2 \
 ## Usage Documentation
 Click on each section below to see the details.
 
-<br /> 
+<br />
 
 <details>
   <summary>  Task selection </summary>
@@ -151,14 +151,29 @@ evaluation = mteb.MTEB(tasks=[
 # for an example of a HF subset see "Subset" in the dataset viewer at: https://huggingface.co/datasets/mteb/bucc-bitext-mining
 ```
 
+* by their modalities
+
+```python
+tasks = mteb.get_tasks(modalities=["text", "image"]) # Only select tasks with text or image modalities
+```
+
+ You can also specify exclusive modality filtering to only get tasks with exactly the requested modalities (default behavior with exclusive_modality_filter=False):
+```python
+# Get tasks with text modality, this will also include tasks having both text and image modalities
+tasks = mteb.get_tasks(modalities=["text"], exclusive_modality_filter=False)
+
+# Get tasks that have ONLY text modality (no image or other modalities)
+tasks = mteb.get_tasks(modalities=["text"], exclusive_modality_filter=True)
+```
+
 </details>
 
 <details>
   <summary>  Running a benchmark </summary>
 
 ### Running a Benchmark
 
-`mteb` comes with a set of predefined benchmarks. These can be fetched using `get_benchmark` and run in a similar fashion to other sets of tasks. 
+`mteb` comes with a set of predefined benchmarks. These can be fetched using `get_benchmark` and run in a similar fashion to other sets of tasks.
 For instance to select the 56 English datasets that form the "Overall MTEB English leaderboard":
 
 ```python
@@ -248,13 +263,13 @@ class CustomModel:
         **kwargs,
     ) -> np.ndarray:
         """Encodes the given sentences using the encoder.
-        
+
         Args:
             sentences: The sentences to encode.
             task_name: The name of the task.
             prompt_type: The prompt type to use.
             **kwargs: Additional arguments to pass to the encoder.
-            
+
         Returns:
             The encoded sentences.
         """
@@ -298,7 +313,7 @@ evaluation.run(model)
 
 ### Using a cross encoder for reranking
 
-To use a cross encoder for reranking, you can directly use a CrossEncoder from SentenceTransformers. The following code shows a two-stage run with the second stage reading results saved from the first stage. 
+To use a cross encoder for reranking, you can directly use a CrossEncoder from SentenceTransformers. The following code shows a two-stage run with the second stage reading results saved from the first stage.
 
 ```python
 from mteb import MTEB
@@ -454,7 +469,7 @@ model_w_contamination = ModelMeta(
 ### Running the Leaderboard
 
 It is possible to completely deploy the leaderboard locally or self-host it. This can e.g. be relevant for companies that might want to
-integrate build their own benchmarks or integrate custom tasks into existing benchmarks. 
+integrate build their own benchmarks or integrate custom tasks into existing benchmarks.
 
 Running the leaderboard is quite easy. Simply run:
 ```py
@@ -480,12 +495,12 @@ There are times you may want to cache the embeddings so you can re-use them. Thi
 from mteb.models.cache_wrapper import CachedEmbeddingWrapper
 model_with_cached_emb = CachedEmbeddingWrapper(model, cache_path='path_to_cache_dir')
 # run as normal
-evaluation.run(model, ...) 
+evaluation.run(model, ...)
 ```
 
 </details>
 
-<br /> 
+<br />
 
 
 
@@ -520,22 +535,14 @@ evaluation.run(model, ...)
 MTEB was introduced in "[MTEB: Massive Text Embedding Benchmark](https://aclanthology.org/2023.eacl-main.148/)", feel free to cite:
 
 ```bibtex
-@inproceedings{muennighoff-etal-2023-mteb,
-    title = "{MTEB}: Massive Text Embedding Benchmark",
-    author = "Muennighoff, Niklas  and
-      Tazi, Nouamane  and
-      Magne, Loic  and
-      Reimers, Nils",
-    editor = "Vlachos, Andreas  and
-      Augenstein, Isabelle",
-    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
-    month = may,
-    year = "2023",
-    address = "Dubrovnik, Croatia",
-    publisher = "Association for Computational Linguistics",
-    url = "https://aclanthology.org/2023.eacl-main.148",
-    doi = "10.18653/v1/2023.eacl-main.148",
-    pages = "2014--2037",
+@article{muennighoff2022mteb,
+  doi = {10.48550/ARXIV.2210.07316},
+  url = {https://arxiv.org/abs/2210.07316},
+  author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\i}c and Reimers, Nils},
+  title = {MTEB: Massive Text Embedding Benchmark},
+  publisher = {arXiv},
+  journal={arXiv preprint arXiv:2210.07316},
+  year = {2022}
 }
 ```
Original file line number	Diff line number	Diff line change
`@@ -2,4 +2,4 @@`
`2`	`2`	`"recommendations": [`
`3`	`3`	`"charliermarsh.ruff"`
`4`	`4`	`]`
`5`		`-}`
	`5`	`+}`
Original file line number	Diff line number	Diff line change
`@@ -4,5 +4,5 @@`
`4`	`4`	`],`
`5`	`5`	`"python.testing.unittestEnabled": false,`
`6`	`6`	`"python.testing.pytestEnabled": true,`
`7`		`- "editor.defaultFormatter": "charliermarsh.ruff",`
	`7`	`+ "editor.defaultFormatter": "charliermarsh.ruff"`
`8`	`8`	`}`