Polish leaderboard and benchmark does not match #1917

KennethEnevoldsen · 2025-02-01T12:04:26Z

Seems like the implementation in benchmarks.py uses the STSBenchmarkMultilingualSTS task, but the version on the current MTEB leaderboard does not use this task. This makes the scores between v1 and v2 incompatible.

I see two solutions:

removing it from benchmarks.py
the old leaderboard is incorrect and doesn't change anything (this will lead to a few cases where top models in v1 appear at the top in v2 as they don't have the score)

@rafalposwiata I will leave it to you to decide between the two. I would probably prefer 1.

related to #1867

The text was updated successfully, but these errors were encountered:

KennethEnevoldsen · 2025-02-04T18:12:23Z

I will assume that the previous leaderboard is correct and remove STSBenchmarkMultilingualSTS

rafalposwiata · 2025-02-05T07:53:23Z

Task STSBenchmarkMultilingualSTS was not previously included in the PL-MTEB so it can be removed.

I think the results for clustering tasks need to be verified, as v2 versions of the tasks have appeared. There may be an incompatibility here. I will check it out.

KennethEnevoldsen added the leaderboard issues related to the leaderboard label Feb 1, 2025

KennethEnevoldsen self-assigned this Feb 1, 2025

KennethEnevoldsen mentioned this issue Feb 1, 2025

Overview: Leaderboard release #1867

Open

7 tasks

KennethEnevoldsen added a commit that referenced this issue Feb 4, 2025

Fixes #1917

acdcc6c

KennethEnevoldsen mentioned this issue Feb 4, 2025

fix: leaderboard and benchmark.py inconstiencies #1956

Merged

KennethEnevoldsen closed this as completed in 64c17b6 Feb 5, 2025

KennethEnevoldsen closed this as completed in #1956 Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish leaderboard and benchmark does not match #1917

Polish leaderboard and benchmark does not match #1917

KennethEnevoldsen commented Feb 1, 2025 •

edited

Loading

KennethEnevoldsen commented Feb 4, 2025

rafalposwiata commented Feb 5, 2025

Polish leaderboard and benchmark does not match #1917

Polish leaderboard and benchmark does not match #1917

Comments

KennethEnevoldsen commented Feb 1, 2025 • edited Loading

KennethEnevoldsen commented Feb 4, 2025

rafalposwiata commented Feb 5, 2025

KennethEnevoldsen commented Feb 1, 2025 •

edited

Loading