-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-shot default #1953
Comments
As it is now, we use a UTF-8 emoji, and there is no cross that renders red in Gradio :( I have also tried a markdown field with emoji names between colons ( I agree with the incentivization part, but I would prefer not giving up on the default filter. I have heard some people have even stronger opinions about it. It's an easy fix though if we collectively decide that something else is a better move. |
We could do❓: Not known and |
I'd prefer the cross, but we can change it if that's the vibe |
Noting that the Open LLM leaderboard also allows contaminated models by default (https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/); Maybe we do a community vote on this? |
The leaderboard also looks much more alive with |
I believe we already had this discussion on Slack
Can't seem to find any contaminated models on their leaderboard, only: ![]() We have also had specific complaints from model developers about known zero-shot models being on the leaderboard (reducing trust in the scores). E.g. the NV-Embed-v2 scores aren't comparable to e5. The voyage-3-exp is a quite clear result of this (not recommended for use but is at the top). I would say the ranking should (to the extent possible) reflect something that we would recommend. I think However, from this, it might be too harsh with the Let me know what you think @Muennighoff |
I have converted another thread on this to a discussion: #2119 - so to keep it in one place I will close this one, but seems like this is def. worth discussing |
I worry that the
Zero-shot
default ofAllow Unknown
in the leaderboard will incentivize people not to share/talk about their training data if it includes any benchmark data. I think maybe just doingAllow all
but making the cross a bit clearer? Can we somehow make it red like ❌The text was updated successfully, but these errors were encountered: