Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-shot default #1953

Closed
Muennighoff opened this issue Feb 4, 2025 · 7 comments
Closed

Zero-shot default #1953

Muennighoff opened this issue Feb 4, 2025 · 7 comments
Labels
enhancement New feature or request leaderboard issues related to the leaderboard

Comments

@Muennighoff
Copy link
Contributor

I worry that the Zero-shot default of Allow Unknown in the leaderboard will incentivize people not to share/talk about their training data if it includes any benchmark data. I think maybe just doing Allow all but making the cross a bit clearer? Can we somehow make it red like ❌

@x-tabdeveloping
Copy link
Collaborator

As it is now, we use a UTF-8 emoji, and there is no cross that renders red in Gradio :(

I have also tried a markdown field with emoji names between colons (:cross:) but it doesn't work in the table.

I agree with the incentivization part, but I would prefer not giving up on the default filter. I have heard some people have even stronger opinions about it. It's an easy fix though if we collectively decide that something else is a better move.

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Feb 4, 2025

We could do❓: Not known and ⚠️: Not zero.-shot

@x-tabdeveloping
Copy link
Collaborator

I'd prefer the cross, but we can change it if that's the vibe

@Muennighoff
Copy link
Contributor Author

Noting that the Open LLM leaderboard also allows contaminated models by default (https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/); Maybe we do a community vote on this?

@Muennighoff
Copy link
Contributor Author

The leaderboard also looks much more alive with Allow all as it is way more models 🤔

@KennethEnevoldsen
Copy link
Contributor

Maybe we do a community vote on this?

I believe we already had this discussion on Slack

Noting that the Open LLM leaderboard also allows contaminated models by default

Can't seem to find any contaminated models on their leaderboard, only:

Image

We have also had specific complaints from model developers about known zero-shot models being on the leaderboard (reducing trust in the scores).

E.g. the NV-Embed-v2 scores aren't comparable to e5. The voyage-3-exp is a quite clear result of this (not recommended for use but is at the top). I would say the ranking should (to the extent possible) reflect something that we would recommend.

I think Allow Unknown makes a reasonable compromise between encouraging good practices, fair comparison and building trust, while not punishing APIs.

However, from this, it might be too harsh with the ⚠️ and instead using the ❓would be more appropriate.

Let me know what you think @Muennighoff

@KennethEnevoldsen
Copy link
Contributor

I have converted another thread on this to a discussion: #2119 - so to keep it in one place I will close this one, but seems like this is def. worth discussing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

3 participants