-
ci: update docs ci for UV (
f69313f
) -
ci: update docs ci for UV (
0c99b3b
) -
ci: fix remaining ci (
6459ea1
) -
ci: Update ci to work with uv (
46c4af6
) -
ci: Update ci to use UV (
b59f6e9
)
- fix: update uv (
a18205c
)
- Merge pull request #189 from KennethEnevoldsen/add-jina
fix: re-add jina and add arctic (61d3255
)
-
Merge branch 'add-jina' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-jina (
77a7362
) -
format (
af596d5
) -
updated uv file (
d6298dd
) -
Merge branch 'main' into add-jina (
e898adf
) -
finally got batching to work correcrtly (
ff8d2db
) -
add results (
3a8ecfa
) -
convert makefile to uv (
9c9e7c6
) -
Merge branch 'add-jina' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-jina (
60cc806
) -
Update makefile (
5128f41
) -
Merge pull request #184 from KennethEnevoldsen/historical_task
fix: Historical task (17e35a0
)
-
Added results for jina (
ba4e15d
) -
fixed bugs revealed by type checker (
c226638
) -
fixed ruff (
acbaa3f
) -
fix import (
f521344
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into historical_task (
e813b35
) -
Added jina results (
c986df4
) -
added arctic model (
9d87597
) -
added results (
b597013
) -
Merge branch 'add-jina' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-jina (
2222d52
) -
delete all prev scores
with the exception of LCC which have been overwritten with new scores (369fb76
)
-
added a few fixes to the jina implementation (
a496071
) -
Merge branch 'add-jina' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-jina (
83af9cb
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-jina (
ccb84b1
)
-
fix: Add Jina
-
fix: Add Jina
I am still running the model
-
Add results for Jina
-
Add results for Jina
-
format fixes
-
Update mkdocs to legacy python
Co-authored-by: Your Name <you@example.com> (84c2f91
)
- fix: Add Jina
I am still running the model (702ba9d
)
-
Update mkdocs to legacy python (
a04774d
) -
format fixes (
6419b28
) -
Add results for Jina (
85ed45e
) -
Add results for Jina (
67818c2
) -
Moved historical task to experimental_tasks (
e915f78
) -
Added results on the historical task (
74b589b
) -
Added MiMe_Memo results (
c494605
) -
Fixed historical clustering (
979a282
) -
Fixed ruff version (
b3c78cc
) -
Added Memo-bert-03 model (
5c60d65
) -
Added instruction for new task on instruct models (
6eb88ec
) -
Added new task to registry (
1677dda
) -
Added historical danish clustering task (
5e9a0b5
)
- docs: fix sizes of tables (
07a7ec3
)
- fix: Added language to KFST (
c89856a
)
- Merge pull request #182 from KennethEnevoldsen/add_kfst_model
fix: Added language to KFST (324dde4
)
- Merge pull request #180 from KennethEnevoldsen/add_kfst_model
docs: fix sizes of tables (e8848e2
)
- Merge pull request #179 from KennethEnevoldsen/add_kfst_model
Added kfst model (542511d
)
- Added kfst model (
88dce65
)
- ci: remove macos due to it being slow (
452bfe2
)
-
fix: Added results from bge-m3 (
3703bac
) -
fix: type checking ignore voyage (
cd1da36
) -
fix: format (
b77caf9
) -
fix: Added models results for voyage (
fc19796
) -
fix: Added models results for newly added models (
2b0cba5
) -
fix: Added new models (
ea830fd
)
- Merge pull request #178 from KennethEnevoldsen/add-models-and-muni-code
fix: Added bge, voyage, cmlm-multilingual and mxbai models (3e600a3
)
-
fix typeerrror (
43da9ca
) -
format (
e695520
) -
Merge branch 'add-models-and-muni-code' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-models-and-muni-code (
e3c1d89
) -
Aded bge results (
41e6904
) -
minor fixes (
ec0df26
) -
minor fixes (
d14d6a9
) -
Added speed estimates (
74323ee
) -
Merge branch 'add-models-and-muni-code' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-models-and-muni-code (
8f523f1
) -
Added MuniIntent classification (
274d7fa
) -
Update README.md (
920241c
) -
Update README.md (
5cff1b2
) -
Update citation.cff (
84360d4
) -
Merge pull request #175 from KennethEnevoldsen/add-licenses
docs: Added licenses (c8376f9
)
- docs: minor updates to tables (
9f7da4b
)
- Merge pull request #174 from KennethEnevoldsen/minor-fixes
fix: Fixing links for prs (5dc4ca3
)
- docs: Updated tables to include task subtypes (
e119c58
)
- fix: Added task subtypes to tasks
This follows the denotion in the paper. A task can have multiple task subtypes but only one task type. (7fc9ed5
)
- Merge pull request #162 from KennethEnevoldsen/add-task-subtypes
Added task subtypes (363ab09
)
- fix: Pass the task for encode_queries, and encode_corpus
This yield notable performance improvements for the instruct models for retrieval tasks (9992e80
)
- Merge pull request #156 from KennethEnevoldsen/fix_instruct_tuned_embed
fix: Pass the task for encode_queries, and encode_corpus (13786fe
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into fix_instruct_tuned_embed (
69b2ae2
)
- chore: remove test file (
14f9935
)
- docs: Updte docs script to handle new name format (
aa171dd
)
-
fix: fix incorrect emb. size for e5 large instruct (
7865ad7
) -
fix: Added final for mult. e5 instruct, including speed test of ref. system (
08e1779
) -
fix: added multilingual-e5-large-instruct (
56bfc16
) -
fix: rename model_architecture to architecture to not take up protected attribute for pydantic (
f845a49
)
- Merge pull request #155 from KennethEnevoldsen/add-multilingual-instruct
Add multilingual e5 instruct (c2cca49
)
- feat: Ensure that all model names are consistent
i.e. that they have the same name as they would have on the benchmark (c2299cd
)
-
fix: made the to method optional on the encoder (
157a91c
) -
fix: Add to method to lazyloadencoder (
0b6d0be
) -
fix: Ensure return type is always np.ndarray (
e8d3994
) -
fix: Ensure return type is always np.ndarray (
06c5cd8
)
- Merge pull request #153 from KennethEnevoldsen/ensure-consistent-names
Ensure consistent names (83fd962
)
- Merge branch 'ensure_return_type' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into ensure_return_type (
7d09487
)
- ci: Added not planned as valid no stale label (
a2dd834
)
-
fix: Removed translate-embed integration test (
adb9cd6
) -
fix: removing smaller translate then embed models (
fbb9e97
) -
fix: removing smaller translate then embed models (
91f6b79
) -
fix: Add missing scores (
3b92090
) -
fix: Added e5 mistral scores (
7515e79
)
- style: ran linting (
f729288
)
- Merge pull request #143 from KennethEnevoldsen/run-e5
Updated e5-mistral model (0026c9c
)
-
Merge branch 'run-e5' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into run-e5 (
fdc19fb
) -
Merge branch 'run-e5' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark into run-e5 (
e691448
)
- docs: Added dataset disclaimer (#145) (
6b3e71b
)
-
fix: Updated tests (
63d33c3
) -
fix: Applied linter and static type checks (
b1baee9
) -
fix: Added get_documents to task interface (
c4fb354
) -
fix: updated e5 model
fixed passing of batch size, ensure it can run on DanFEVER and avoid collecting to gradient (which lead to OOM errors) (189751d
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into run-e5 (
64e4986
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark (
24644a7
) -
Fix: Added performance metrics for translate and embed (
91a2b8a
) -
Added SwednClustering and Retrieval to cache (
613ee8c
) -
Added NorQuad and SNL_retr to cache (
fe823c7
) -
Added a couple of tasks from e5 Mistral (
b94648c
) -
Embeddings are sent back to the CPU, so they can be converted to numpy arrays (
e154c39
) -
Lowered maximum batch size to 16 (
f9b68af
) -
Added GPU inference to E5 Mistral (
106084b
) -
Merge branch 'main' into run-e5 (
2fb8807
)
- Merge pull request #137 from KennethEnevoldsen/update-sonar-models
Update and rerun sonar models (28ff5e5
)
- feat: Updated sonar models (
f5f7374
)
-
fix: Applied linter (
702f804
) -
fix: Added model type and releae date to model meta
This is to allow the tracking of improvement on SEB over time (49c9f1a
)
-
fix: Added results of the sonar models (
d0988b6
) -
fix: ensure that sonar model are proberly moved to device (
746934a
) -
fix: updated sonar requirement to handle >512 token sequences (
792eb80
) -
fix: removed cache of sonar models, due to new update (
c140b9f
)
- Merge pull request #140 from KennethEnevoldsen/adding-architecture
fix: Added model type and releae date to model meta (245e881
)
- Merge pull request #131 from KennethEnevoldsen/mistral-instructions
Added instructions for all tasks in Mistral E5 (006c253
)
- Added instructions for all tasks in Mistral E5 (
07c4c95
)
-
docs: Updated table w. dataset descriptions (
f955379
) -
docs: Added across column for coverage
added swapped formalization and task columns (31d9d76
)
-
fix: Updated dataset description metadata and script (
0e63eb4
) -
fix: Update calc. descriptive stats (
bdbd552
)
- Merge pull request #130 from KennethEnevoldsen/docs-update
Update task/dataset descriptions (144f025
)
- fix: Updated table generation of the benchmark (
c87aed4
)
- style: ran linting (
489e124
)
- Merge pull request #127 from KennethEnevoldsen/run-models
Ran most of the models (db1868e
)
-
remo ve test srcipt (
dcbe547
) -
Added translate and embed scores (
26805ad
) -
Merge branch 'run-models' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark into run-models (
5570776
) -
Added fasttext and translate scores (
d725b44
)
-
docs: sort model on new tables (
d4d9e56
) -
docs: Minor grammatical fixes (
633b47d
) -
docs: Added speed x performance tab to documentation (
b3011a3
) -
docs: renamed run_benchmark to update_benchmark tables
Also disabled it actually running the model. Models are now run using seb run
(da562b7
)
- docs: restructured dataset overview (
30440f0
)
-
fix: correctly check if model have a to() method (
e2a1f44
) -
fix: ensure that the correct sentence transformer wrapper in using the CLI (
03fff7f
) -
fix: removed GPU test from speed test
at least for the moment (a7600bd
)
-
fix: Updated cache for models (
ec07680
) -
fix: lower default batch size for e5 mistral model (
3cdd55e
) -
fix: added e5 mistral cache (
ecc85c4
) -
fix: Added cache for all smaller models on all tasks (
789c8b7
) -
fix: added WPS to speed tasks and benchmark (
7ebaca4
) -
fix: Updated scores for all API models (
f10c01a
) -
fix: remove duplicate e5 mistral model (
2de6ee2
) -
fix: Updated tables (
d30284c
)
- style: Applied linter (
2b0342d
)
-
tests: remove fasttext from integration test as it takes too long to download (
84baefa
) -
ignore fasttext files (
0f9823e
) -
ignore fasttext models (
4dbba98
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into run-models (
f13be1c
) -
Merge pull request #121 from KennethEnevoldsen/update-tables
Updated docs (02b5993
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into update-tables (
db3ae97
) -
Removed req. for datawrapper api when running benchmkark (
ee8673f
) -
Merge pull request #103 from KennethEnevoldsen/restructure_model_interface
Added LazyLoadEncoder, added SebModel and removed EmbeddingModel (4e1cd36
)
-
Fixed linting (
82c0a49
) -
Changed Norwegian Bitext Mining revision to None (
a5675eb
) -
Added docstring to LazyLoadEncoder, adjusted api.md to new interface. (
c48e9cf
) -
Changed model building to use SentenceTransformerWithTaskEncode in CLI (
c85aad0
) -
Ran linting on danish tasks (
3663cd8
) -
Ran linting (
8f1c3b3
) -
Removed verbose parameter, as it does not exist (
2a743d0
) -
Converted new OpenAI models to the new interface (
8b24890
) -
Fixed syntax error in Cohere (
e1d18f1
) -
Fixed type errors in speed task (
834e5c7
) -
Fixed tuple.extend type errors (
341930a
) -
Merge branch 'main' into restructure_model_interface (
7b20806
) -
Merge branch 'run-models' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into run-models (
9a80296
) -
Merge branch 'run-models' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark into run-models (
11734d0
) -
remove accidental test file (
25da0fe
) -
added openAI scores (
edff86b
) -
Remove cohere scores (
bcfdee5
)
- fix: Added cache for speed benchmark (
f3cd21e
)
- Merge pull request #120 from KennethEnevoldsen/ran-speed-bench
fix: Added cache for speed benchmark (b4596c3
)
- Merge pull request #109 from KennethEnevoldsen/add-sts-retrieval-dataset
Add SNL Clustering task (9d1dae4
)
- Merge pull request #110 from KennethEnevoldsen/sts-retrieval
Add SNL retrieval (9412d71
)
- Merge branch 'add-sts-retrieval-dataset' into sts-retrieval (
a1216bf
)
- Merge pull request #113 from KennethEnevoldsen/twitterhjerne
Added twitterhjerne (a32dea2
)
- Merge pull request #117 from KennethEnevoldsen/nordjylland-retrieval
Added tv2nord retrieval dataset (1721983
)
- Merge pull request #118 from KennethEnevoldsen/norquad
Added NorQuad (0cf610d
)
- Merge pull request #119 from KennethEnevoldsen/updated-coverage-tables
Updated coverage tables (94f72d5
)
- Merge pull request #108 from KennethEnevoldsen/add-sts-retrieval
Restructured MTEB (115acef
)
-
feat: Added NorQuad (
b1ab34d
) -
feat: Added tv2nord retrieval dataset (
c315d94
) -
feat: Added twitterhjerne (
8509c37
) -
feat: Added SNL Retrieval (
b1169ff
) -
feat: Added SNL clustering (
c380519
) -
feat: removed swedn sts to experimental tasks (
6d22966
)
-
fix: speed benchmark actually runs the speed tasks (
7ae3ef0
) -
fix: Update wrong language tag. (
6a143e7
) -
fix: remove ingress from the SNL corpus as they almost always contain the headline (
617d616
) -
fix: restructure mteb tasks to its own folder (
d88864d
)
- fix: Added new OpenAI Models (
e096ef5
)
- ci: Updated lint workflow to actually fail when not linted (
d1c177c
)
- fix: Added relevant type ignores (
27b8dd3
)
- Merge pull request #104 from KennethEnevoldsen/ci-lint
Ci lint (ade460b
)
-
fix: ran swednsts and reduced dataset size (
6c0a030
) -
fix: ensure that metrics is correctly formatted from MTEB (
b5873b8
)
- Merge pull request #100 from KennethEnevoldsen/sts_vs_retrieval
Reduce size of SwednSTS (c2c32b7
)
-
Added LazyLoadEncoder, added SebModel and removed EmbeddingModel (
805c343
) -
fixed type hints (
0a5fd27
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into sts_vs_retrieval (
141ed73
) -
Merge pull request #89 from KennethEnevoldsen/stuff_runs_tests
Added integration test for four model types (f427a6d
)
- Merge pull request #92 from KennethEnevoldsen/custom_embeddings
Custom embeddings for E5 and Cohere + Interface changes to accomodate this (aeb32ce
)
-
Reset test cases incorrectly overwritten by a merge conflict resolution (
ccdd886
) -
Merge branch 'custom_embeddings' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into custom_embeddings (
2414337
) -
Put resetting the model's encode() method to a finally clause (
34a2612
) -
Merge branch 'main' into custom_embeddings (
18cd858
) -
Removed debugging print statements from E5 (
cc12070
) -
Made EmbeddingModel into a dataclass instead of BaseModel (
75debec
) -
Removed reference to MTEBTask from ScaLA (
674a005
) -
Replaced MTEBTaskModel with partial() (
79708bf
) -
Added encode_queries and encode_documents to EmbeddingModel, made task optional (
ecee037
) -
Merge branch 'main' into stuff_runs_tests (
01e8979
) -
Moved models to @parametrize (
fafcb39
)
- feat: Added performance metrics for danfever (
22eb72b
)
- fix: limit the size of STS (
0d1b659
)
- Merge pull request #97 from KennethEnevoldsen/add-danfever
Add danFEVER (801753f
)
-
appease pyright (
a572962
) -
tests: remove tests which has to be changed when adding new datasets (
04aa44e
) -
tests: convert test_task back to normal (
be2c071
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-danfever (
69a5a03
)
- ci: fix mispecified yaml syntax (
ca5567c
)
-
feat: Added danfever (
ccec57c
) -
feat: Added VG clustering dataset (
49e75d5
) -
feat: Add swedn clustering (
0786ec5
)
-
fix: Update indexes to strings (
37d165f
) -
fix: fixed error arised from merge (
11e28d6
) -
fix: updated based on static type checks (
4752f07
) -
fix: move description to the end as to make printing of task object prettier (
f8ec70d
) -
fix: reduced size of SwednClustering and ensure that clusters match with document size (
0b70730
)
-
test: Performance using 5x2048 examples is 8.13 (
ed5cb5d
) -
test: Performance using 5x10000 examples is 13.80 (
ed36b82
) -
test: Performance using 2x10000 examples is 8.70 (
6fe30b7
) -
test: Performance using 10000 examples is 8.46 (
630769c
) -
test: Performance using 1000 examples is 8.12 (
7732c32
) -
test: Performance using 100 examples is 21.07 (
82f7b3f
)
- Merge pull request #96 from KennethEnevoldsen/add-swedn-clustering
Add Swedn and VG clustering datasets (8537e12
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-danfever (
796e3c9
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-swedn-clustering (
18f9afb
) -
tests: refactored tests to not be highly dependent on a few tasks (
4b1eaa5
) -
Added a bunch of experiments for the vg summerization. (
d9a13cb
) -
Added task-dependent and asymmetrical embeddings for Cohere (
6742fdd
) -
Added encode_queries() and encode_corpus() to E5Wrapper (
53390da
) -
Added integration test for MiniLM + LCC (
4560a16
) -
Added return type to appease linter (
828e556
) -
Switched out fasttext package with fasttext-wheel (
2cd74d4
) -
Added pybind as an optional dependency for fasttext (
77cdd4b
) -
Added fasttext as dependency in the makefile (
012a689
) -
Added Dummy task to integration test, remove all-MiniLM-L6 (
2f75712
) -
Fixed issue with fasttext models (
0a015ba
) -
Merge pull request #90 from KennethEnevoldsen/types
Moved task types to task interface and deleted types module (7c3b582
)
-
Added English to Language type (
221bdd8
) -
Added pybind install to makefile (
31d2a35
) -
Removed faulty import in E5 models (
601002c
) -
Added fasttext to testing dependencies (
045de9f
) -
Renamed test function (
3c50f4f
) -
Moved integration test to new file (
6d3ba6d
) -
Merge pull request #91 from KennethEnevoldsen/new_models
Added Jina base (95c515e
)
-
Fixed import error in speed task (
cfccbdf
) -
Added Jina base (
6d1ec69
) -
Changed DKHate to LCC (
8c004af
) -
Moved task types to task interface and deleted types module (
2f1adf1
)
- fix: added task argument to TranslateE5 encoding (
71dcd09
)
- Merge pull request #87 from KennethEnevoldsen/new_models
Added XLM-Roberta large and LaBSE (74fcf43
)
- Merge pull request #85 from x-tabdeveloping/main
Added FastText and Translate-E5 models (2d9043e
)
-
Removed commented-out lines (
373b937
) -
Removed duplicate model (
476d679
) -
Fixed duplicate model names (
7275686
) -
Added XLM-Roberta large and LaBSE (
858db1b
) -
Added integration test for four model types (
61ff3bb
) -
Merging upstream into the branch so that it contains the fixed E5 models, that pass along the task. (
f6f71db
) -
TranslateE5 now uses E5Wrapper to ensure task-correct embeddings and prefixes. (
5eff1fc
) -
Translate now returns a single string instead of a list (
e1cefd9
)
- feat: Added SwednRetrieval task
The idea is that it can be compared with SwednSTS to which one makes the most sense. (7fe3371
)
- Merge pull request #82 from KennethEnevoldsen/add-retrieval-swedn
feat: Added SwednRetrieval task (d5f959d
)
-
No longer imports ..types because of an error (
c75629a
) -
Fasttext now loads on initialization instead of the first encode() call (
e6b1fda
) -
Added Translate E5 models (
33835af
) -
Added fasttext models to cache (
fbc9482
) -
Fixed model names (
d28f259
) -
Changed model fasttext names (
7871ab5
)
- fix: Allow models to batch inputs (
09c3527
)
- Merge pull request #70 from KennethEnevoldsen/add-speed-task
Added speed task (d192e44
)
- fix: Add toggle for verbosity on the cli and remove duplicate entries in table (
4d26fce
)
- Merge pull request #74 from KennethEnevoldsen/verbosity_for_cli
Fix verbosity toggle on CLI and remove duplicate entries in table (99ef0f2
)
-
Fixed newline error with FastText (
65411a3
) -
Remove model results for repo (
2435011
) -
Made fasttext models compatible with the new interface (
ac8d46f
) -
Merge branch 'main' of https://github.com/x-tabdeveloping/scandinavian-embedding-benchmark into main (
c0be7a6
) -
Added fasttext models for nn, nb, da, sv (
1820689
) -
Added Fasttext models for nb, nn, sv and da (
184dde8
)
- fix: ScaLA now correctly wraps models to allow for task argument to be passed (
3b07a4d
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
07efe8f
)
- feat: Added speed task for estimating the speed of the embedding models (
25caacc
)
-
fix: ScaLA now correctly wraps models to allow for task argument to be passed Renamed scala cache (
a70c950
) -
fix: fixed a type hint in tests (
da32c0e
)
- Merge pull request #73 from KennethEnevoldsen/bug-scala-missing-task-encode-wrapper
Wraps ScaLA models in MTEBTaskModel (e2eee05
)
-
Merging with current status of Upstream (
9c1fdf3
) -
Added Fasttext models for nb, nn, sv and da (
4249333
)
- docs: Added norwegian courts to table (
4f31602
)
- Merge pull request #66 from KennethEnevoldsen/allow-custom-embeddings
Allow custom embeddings (698453a
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into allow-custom-embeddings (
f8b91f1
)
- ci: rename pre-commit to lint (
1f3c743
)
-
docs: Updated CLI docs for run (
8ff59b3
) -
docs: make sure that tutorials are tested on prs (
e4ef73a
) -
docs: Updated tutorial with the CLI (
525eaf2
)
-
fix: Run command now print table for target models (
b7d444b
) -
fix: Benchmark result now save in the same format as the cache (
896c3bf
) -
fix: updated according to static type checks (
f3c77aa
) -
fix: Added missing init file to make sure that docs build (
8d44640
) -
fix: require positional argument for encoder (
a7040a5
) -
fix: restructure repo (
64bace6
)
- Merge pull request #65 from x-tabdeveloping/main
Added Norwegian bitext mining task (2e8bb07
)
- Merge pull request #68 from KennethEnevoldsen/cli_updates
Added table to CLI (2457782
)
- Merge pull request #69 from KennethEnevoldsen/KennethEnevoldsen-patch-1
Update paper.md (1676c66
)
-
Update paper.md (
eca5a43
) -
Changed ruff formatter to use line length 150 (
1079489
) -
Merge remote-tracking branch 'upstream/main' into main (
d0739d8
) -
Merge branch 'cli' of https://github.com/x-tabdeveloping/scandinavian-embedding-benchmark into cli_updates (
7ef411f
) -
tests: Tests pass as inteded (
d19c650
) -
Altered tests for CLI (
e4f5457
) -
Tried fixing type errors (ignore problems that are not actually problems) (
6123138
) -
Changed -h docs to reflect changes in behaviour (
b8b8169
) -
Model printing nicer with less space, fixed multiple arguments, and implemented new output interface (
fdd072b
) -
Made model and output path optional (
fb1c976
) -
Added more direct reference and commented out task (
566a009
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
af6f926
) -
Added Norwegian bitext mining task (
49b1655
) -
Moved more code into main CLI, pretty printing now takes DataFrame (
ae1cb90
) -
Added pretty printing benchmark results in table to CLI (
eba47c1
) -
Merge pull request #62 from KennethEnevoldsen/update-tutorial-with-cli
Make sure that tutorials are tested on prs (ebf1640
)
- fix: Fixed errors derived from merge conflicts (
5f086cc
)
- Merge pull request #60 from KennethEnevoldsen/add-summarization
Add summarization (065b0e6
)
- feat: Allow tasks to be passed to benchmark instead of just strings (
57a9b19
)
-
fix: remove commented out code (
04187b3
) -
fix: remove DKHate from tests (
3192fff
) -
fix: Renamed Scala -> ScaLA to ensure cache hit on non osx system (
d9f7b05
) -
fix: Added intfloat/e5-mistral-7b results to cache (
93b0086
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-summarization (
5072584
) -
Merge pull request #57 from KennethEnevoldsen/move_cli_to_radicli
fix: Added new and more comprehensive CLI (2b0a47a
)
-
tests: Fix ordering of test input (
971c9ac
) -
removed files (
1e78ada
) -
tests: Ensure that dummy tasks are not added to registry (
affabc8
) -
Fixed type annotation to 3.9 (
cbb73d6
) -
Merge branch 'move_cli_to_radicli' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into move_cli_to_radicli (
76143d9
) -
Updated cookiecutter reference from Martins to Kenneths swift template
This old one is no longer udated. (034b428
)
- Updated cookiecutter reference (
71b7ead
)
- docs: added execute flag (
ea0e9ca
)
- fix: Added new and more comprehensive CLI
Including documentation and tests (14ca469
)
-
fix: SebModel -> EmbeddingModel (
d2f9efa
) -
fix: Allow embedding size to be None when using CLI (
c621a8b
)
- Merge pull request #58 from KennethEnevoldsen/update-cruft
Update cruft (870e442
)
- ci: Updated some names in the workflow (
b7c3012
)
-
docs: Added "government" domain to LCC (
232dfee
) -
docs: Added documentation of dataset coverage to datasets (
e4f9468
) -
docs: Added avg rank to benchmark table (
15a821e
)
-
fix: Added Swedn dataset (
ac6e744
) -
fix: refactor utils script out into its subcomponents (
f36be90
) -
fix: Allow optional embedding size in ModelMeta
This makes the possible to create a on the fly models using the CLI (b0b4793
)
-
fix: Updated CLI to now use models which is a part of the benchmark before wrapping it in sentence transformers (
2b60c28
) -
fix: Added embedding size of models (
2937099
) -
fix: Added mistral current scores (
748d8a9
) -
fix: Added prettier prints when running benchmark (
012bcd9
) -
fix: Added option to ignore cache (
8f36080
) -
fix: removed typer dependency (
e519917
) -
fix: removed duplicate on update bnehcmark (
ef6270c
) -
fix: Added cache dir to all entry points (
3fb4280
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into add-summarization (
8ed0a0d
) -
Merge pull request #52 from KennethEnevoldsen/add-dataset
Add embedding size to benchmark (d40a633
)
- Merge pull request #50 from KennethEnevoldsen/run-using-cache
Add public cache to benchmark (67e571c
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark (
6d9d7c1
) -
ignore files (
a1904ed
) -
Added make command for table in docs (
0cb3522
) -
Merge pull request #51 from KennethEnevoldsen/run_mistral_on_ucloud
Added command for running on ucloud (396f79b
)
-
clean: remove test file from cache (
89bb78f
) -
clean: removed test models from cache (
10413e1
) -
clean: remove tests from cache (
af1f52d
) -
Added test for checking if benchmark is up to date (
8fa2545
) -
Moved cache dir to package (
94d6468
) -
Added command for running on ucloud (
c48b32e
) -
Merge pull request #45 from KennethEnevoldsen/updated_norwegian_parl
Updated desc. for norwegian parl. (985dd5d
)
-
Updated desc. for norwegian parl. (
c7f1e74
) -
Merge pull request #43 from KennethEnevoldsen/add-mistral
Added mistral dependencies (a4decba
)
-
Added mistral dependencies (
1d39008
) -
Merge pull request #40 from x-tabdeveloping/main
Added E5 Mistral (c317ad8
)
-
ci: ensure full install in test ci (
6144791
) -
ci: updated docs ci (
4e3b89f
) -
ci: fix python version (
dec0de6
) -
ci: remove cache from docs (
b53fa67
) -
ci: lock python version (
f00d8e8
) -
ci: Update pip and invalidate cache for doc ci (
7c507a4
)
- docs: fixed size of tables (
f889199
)
-
fix: ran ruff formatter (
1d2341c
) -
fix: change to relative import for tests (
3ec0673
) -
fix: Update from cruft template (
5e055da
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
07c5edc
) -
remove invoke from repo (
90d7b0b
) -
Merge pull request #39 from KennethEnevoldsen/update-cruft
Update cruft and fix cruft template (c2bf0cf
)
-
ignore type error (
eb0b647
) -
add missing updated to the makefile (
ec857ab
) -
Update cruft (
8016bd5
) -
Merge pull request #38 from HLasse/patch-3
Update citation.cff (0b099be
)
-
Update citation.cff (
b7fad4e
) -
Merge pull request #36 from KennethEnevoldsen/KennethEnevoldsen-patch-1
Update README.md (8f3f9f9
)
-
added citation cff (
e76f244
) -
Update README.md (
0730366
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
9176011
)
- build: Updated dependencies for mteb (
f300ac5
)
- ci: Updated makefile (
892d72d
)
- fix: added type ignore to optional imports (
febfffb
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
7d792e7
) -
benhcmark: Fixed ordering of columns (
ba64303
) -
benchmark: Added open-source column (
621377b
) -
benchmark: Added globe for multilingual models (
1a53419
)
- docs: fix missing line (
06dd1ed
)
- fix: ran precommit (
cf42195
)
- Merge pull request #27 from timpal0l/patch-1
Update hf_models.py (a72ac2f
)
- Update hf_models.py
Added sentence-transformers/paraphrase-multilingual-mpnet-base-v2
(c4d5886
)
- Merge pull request #25 from x-tabdeveloping/main
Language selection in CLI (4cb3f7e
)
-
Merge branch 'KennethEnevoldsen:main' into main (
9054c0a
) -
Corrected pre-commit errors (
33c71ab
) -
CLI now accepts a list of languages as its input if none are passed the benchmark will be run on all languages (
975418e
) -
Merge pull request #24 from x-tabdeveloping/main
CLI is error tolerant now. (0bcfa98
)
-
Fixed issue with mean calculation in DaLAJ. (
b5d2f6b
) -
CLI made error tolerant prints NA for unobtainable benchmark results (
69be78a
)
- Merge pull request #23 from KennethEnevoldsen/marton_changes
Fixes for #22 (0905a51
)
- updated from cruft (
7891730
)
- fix: update ci based on cruft template (
32ea08d
)
-
Merge branch 'main' of https://github.com/x-tabdeveloping/scandinavian-embedding-benchmark into marton_changes (
3bf2688
) -
Added tabulate to dependencies (
ca94dde
) -
Added main CLI for running benchmark. (
fa92d35
)
- Merge pull request #21 from KennethEnevoldsen/KennethEnevoldsen/issue-Allow-custom-embeddings-based-on-the-task
Fix e5 models (af8cb17
)
-
Merge branch 'KennethEnevoldsen/issue-Allow-custom-embeddings-based-on-the-task' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark into KennethEnevoldsen/issue-Allow-custom-embeddings-based-on-the-task (
76cf49e
) -
Allow custom embeddings based on the task Fixes #18 (
135e546
)
- ci: Remove dependabot (
24ca03f
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
9e9e624
)
-
fix: ruff (
51a0002
) -
fix: run on ucloud (
aebaa4e
) -
fix: Add missing dependency (
ae9571e
) -
fix: ran pyright (
3f95827
) -
fix: Type hints (
ccef58d
) -
fix: ruff (
3c2d922
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/Scandinavian-Embedding-Benchmark (
2e7bb7e
) -
Merge pull request #19 from KennethEnevoldsen/KennethEnevoldsen/issue-Add-OpenAI-embeddings
Add OpenAI embeddings (0ef4b34
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
8809741
)
-
build: only import fairseq models if fairseq2 is installed (
07dc22e
) -
build: move fairseq and sonar to optional dependencies (
0969f57
) -
build: change to pypi version of fairseq2 (
58a62c9
) -
build: add sonar and fairseq2 dependencies (
03d3971
)
-
docs: human translated != machine translated (
0d8f86a
) -
docs: Updated docs (
423fd4e
) -
docs: removed requirements for social card (
fb37b26
)
-
fix: Overwriting task creator (
5da1aed
) -
fix: Updated such that sonar model will get registrered, but not run (
28d151f
) -
fix: Updated pyright (
6fe494f
) -
fix: Updated task name (
f81266a
) -
fix: make sonar return numpy.ndarray (
b1db630
) -
fix: unique names for sonar models (
d7f353d
) -
fix: add model imports to init (
267fb0b
) -
fix: change folder structure, add sonar model per language (
94a293c
) -
fix: add languages to sonar model (
f1d81a1
) -
fix: Added beir requirement for retrieval tasks (
2a614ba
) -
fix: Updated metadata for SweFAQ (
cc2d6c3
) -
fix: Updated dataset name (
48db87e
)
- style: lint (
d9790ad
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
9f0b422
) -
Merge pull request #14 from HLasse/sonar
feat: add sonar model (082c157
)
- Merge pull request #15 from KennethEnevoldsen/dependabot/pip/pyright-1.1.324
deps:(deps-dev): bump pyright from 1.1.323 to 1.1.324 (7cfd512
)
- deps:(deps-dev): bump pyright from 1.1.323 to 1.1.324
Bumps pyright from 1.1.323 to 1.1.324.
updated-dependencies:
- dependency-name: pyright dependency-type: direct:production update-type: version-update:semver-patch ...
Signed-off-by: dependabot[bot] <support@github.com> (7774c60
)
- Merge pull request #13 from KennethEnevoldsen/dependabot/pip/pyright-1.1.323
deps:(deps-dev): bump pyright from 1.1.322 to 1.1.323 (48142d3
)
- deps:(deps-dev): bump pyright from 1.1.322 to 1.1.323
Bumps pyright from 1.1.322 to 1.1.323.
updated-dependencies:
- dependency-name: pyright dependency-type: direct:production update-type: version-update:semver-patch ...
Signed-off-by: dependabot[bot] <support@github.com> (99e112d
)
- Merge pull request #12 from KennethEnevoldsen/dependabot/pip/pyright-1.1.322
deps:(deps-dev): bump pyright from 1.1.320 to 1.1.322 (8086b5a
)
- deps:(deps-dev): bump pyright from 1.1.320 to 1.1.322
Bumps pyright from 1.1.320 to 1.1.322.
updated-dependencies:
- dependency-name: pyright dependency-type: direct:production update-type: version-update:semver-patch ...
Signed-off-by: dependabot[bot] <support@github.com> (2bd7736
)
- Merge pull request #11 from KennethEnevoldsen/dependabot/pip/pyright-1.1.320
deps:(deps-dev): bump pyright from 1.1.318 to 1.1.320 (be26f91
)
- deps:(deps-dev): bump pyright from 1.1.318 to 1.1.320
Bumps pyright from 1.1.318 to 1.1.320.
updated-dependencies:
- dependency-name: pyright dependency-type: direct:production update-type: version-update:semver-patch ...
Signed-off-by: dependabot[bot] <support@github.com> (2032c84
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
a50ee2c
)
- ci: fixed version pointer (
4af85d1
)
- fix: pyproject.toml version reference (
5968bd7
)
-
fix:empty commit for ci (
db43d77
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
68ed93d
)
- docs: Updated links in badge (
8e3ab16
)
- fix: ci: removed outdated variables from pyproject.toml (
b43edd8
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
84244d2
)
- ci: Added permissions (
51de97a
)
- fix: rerun ci (
94362a9
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
7301671
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
66bfdf0
)
- fix: Added documentation to all outward facing functions (
ac96fc1
)
- Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
1464b68
)
-
ci: updated to latest version of release (
bc6394f
) -
ci: fix release action (
419dc23
) -
ci: Fix inv script to avoid issues with pyright (
63de866
) -
ci: fix url (
669b258
) -
ci: Updated docs build to mkdocs (
9cae854
) -
ci: Removed windows from CI (
4d037c2
) -
ci: Added macOS and windows OS (
e4084eb
)
-
docs: Added missing dependencies for docs (
ba4fc76
) -
docs: retrying to get the social card to work (
6403f7d
) -
docs: Minor adjustment (
4fc6881
) -
docs: Added social card (
1b692f6
) -
docs: Updated references when sharing table (
7e45a7a
) -
docs: Updated table with new models (
7083239
) -
docs: minor updates to index page (
8a883c4
) -
docs: testing iframes (
d415cb8
) -
docs: Updated page tile (
ce6ed9a
) -
docs: Updated tables (
dcceccb
) -
docs: fixed heights (
6750010
) -
docs: Updated tables (
f62e506
) -
docs: added notice regarding tests (
b356e62
) -
docs: Updated description (
67f0200
) -
docs: Updated tables (
aa9d7d0
) -
docs: Minor changes to headings (
5fda1ff
) -
docs: added logo (
56d26f8
) -
docs: Added logo (
79d1a17
) -
docs: Updated readme (
fe57c81
) -
docs: Updated the documentation (
4875b62
) -
docs: Added link and removed nav. on main page (
1f284db
) -
docs: Updated docs workflow to use mkdocs (
2b7a6b4
)
-
feat: add multilingual sentence transformer (
6e4ae7a
) -
feat: Bumped version
Still awaiting PR: embeddings-benchmark/mteb#128 (5240256
)
-
fix: removed hotfix for MTEB (
c2e08b3
) -
fix: Added language codes for English models (
8615d3d
) -
fix: Added ignore for static type check (
caca3d7
) -
fix: temporary hotfix while waiting for mteb merge
waiting for embeddings-benchmark/mteb#128 (932b847
)
-
fix: Updated string handling for model references (
39a1060
) -
fix: Correctly sorted values in tables (
e69b4bb
) -
fix: Added full benchmark (
e3d179c
) -
fix: Updated type hint (
963b099
) -
fix: Updated language codes to exclude "no" (
c2c14ed
) -
fix: filtered logging and added progress bar (
ada3dae
) -
fix: Added time to task errors as well (
4c79425
) -
fix: Updated task versioning (
2bc628f
) -
fix: Added all models (
8b69593
) -
fix: Added cache and error handling (
e400cfa
) -
fix: Test loads and fail (
d66b1a6
) -
fix: removed cli for now (
ab03b7b
)
- style: Only enforce pyright on src folder (
0692145
)
-
tests: Remove downloaded tasks from test (
29d5e50
) -
tests: Set custom cache dir for tests (
188cd46
) -
tests: Fix task reference (
ed6dc97
) -
tests: Skip tests with large downloads (
d2d9c64
) -
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
21ffc06
) -
Merge pull request #5 from HLasse/patch-2
feat: add multilingual sentence transformer (15b8045
)
- Merge pull request #4 from HLasse/patch-1
docs: minor updates to index page (c3d7b69
)
-
Merge branch 'main' of https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark (
a1b8182
) -
v0.0.0 (
6703d6c
) -
Merge pull request #2 from KennethEnevoldsen/dependabot/pip/pyright-1.1.318
deps:(deps-dev): bump pyright from 1.1.305 to 1.1.318 (d5f448e
)
- Merge pull request #1 from KennethEnevoldsen/dependabot/pip/invoke-2.2.0
deps:(deps-dev): bump invoke from 2.1.1 to 2.2.0 (c463449
)
- deps:(deps-dev): bump pyright from 1.1.305 to 1.1.318
Bumps pyright from 1.1.305 to 1.1.318.
updated-dependencies:
- dependency-name: pyright dependency-type: direct:production update-type: version-update:semver-patch ...
Signed-off-by: dependabot[bot] <support@github.com> (3b28588
)
- deps:(deps-dev): bump invoke from 2.1.1 to 2.2.0
Bumps invoke from 2.1.1 to 2.2.0.
updated-dependencies:
- dependency-name: invoke dependency-type: direct:production update-type: version-update:semver-minor ...
Signed-off-by: dependabot[bot] <support@github.com> (c114c6b
)