ZeroShot default value #2119

Samoed · 2025-02-20T19:54:55Z

Samoed
Feb 20, 2025
Collaborator

The default value for the ZeroShot filter seems to be confusing people. Maybe we should change it to Allow all? #2114

     Ok you're right – I see it. Thanks!
     Oh, so the default is to allow "unknown training data", which probably includes many models that train on the benchmark datasets but don't report it, but it blocks open models which train on one or more of the MTEB datasets. That's odd too.

Originally posted by @jxmorris12 in #2076 (comment)

Also issue #1953

jxmorris12 · 2025-02-20T20:13:15Z

jxmorris12
Feb 20, 2025

Wow that was awesome and so fast! Thanks team!

0 replies

KennethEnevoldsen · 2025-02-20T20:42:31Z

KennethEnevoldsen
Feb 20, 2025
Maintainer

Just moved this to a discussion.

Another discussion on this was in #1953

We have had this up before, but I don't think it is settled so it is fine that we raise it again. @Muennighoff has also raised this concern and as far as I know, @x-tabdeveloping is on the side that we should be more restrictive (not allow unknown by default). @orion I believe is quite close to where we are now or potentially a bit closer to @x-tabdeveloping (do correct me if I am wrong).

Just to outline the pros and cons for each point and argue for why the default is as it currently is:

Allow all by default:
This Causes less immediate confusion and does not disallow training on the training set (which has its legitimate use cases).
The con is that the scores that you compare are not strictly comparable as some models will perform better simply as they have been trained on a similar distribution of data. Voyage-3-exp is an example of this that would perform on the top of MTEB(eng, v1/v2) were it not for the filtering. It is not the recommended model (the voyage team does not believe it is better), it is simply there to show what you could do if you used the training sets for each split.
Other models that perform above expectations on MTEB but perform worse generally are the NVIDIA-embed models.
Only use known datasets
A solution to this is of course to filter only by models where you have known datasets. This makes all scores comparable (limited by our annotations, which are not perfect). A pro is that this encourages good practice and open datasets.
Con: This heavily limits the leaderboard useability for the industry as commercial solutions often do not share their training datasets. This in turn limits the usability of the leaderboard to compare the leaderboard
Allow unknown:
An alternative solution is to "allow unknown". This balances 1 and 2 but has the irrelevant con that it can encourage model developers to train on the training data and not disclose it. Currently, we are hoping that at some point this will be discovered and would result in a lot of bad press.

current solution

You can pick between all 3 (pick your own poison)
default uses 3), to avoid hard downsides of 1) and 2)
- (1) has the hard downsides of allowing that we would generally not recommend to appear on top.
- (2) not having APIs there for comparison limits the relevance of the benchmarks
Additionally, however, we seek to encourage good practices by:
- 1. clearly marking when the zero-shot status is unknown. We allow API providers to state that they are zero-shot on a leaderboard if they want. (they can of course lie, but this would be deliberate cheating). In Zero-shot default #1953 it was discussed that the current emoji are quite "harsh" and we could do❓: Not known and ⚠️: Not zero.-shot instead of the current.
- 1. encourage open and efficient models using a throughput x performance (it makes no sense to add APIs here)

3 replies

KennethEnevoldsen Feb 20, 2025
Maintainer

Would love to hear more opinions here though, we might be missing relevant points or weighting them poorly.

bwanglzu Feb 21, 2025

thanks for your input @KennethEnevoldsen and thanks for @Samoed 's quick action. I left my comments here.

To be honest i think it's too hard (almost impossible) to make a clear definition, e.g. model's trained on such data but the training data accounts for a tiny proportion, or model B fine-tuned on model A without using the training data but we don't know it's a fine-tuning model, or model merging / distillation :)

I would personally put more priority on more research into "how can we make embeddings better on unknow domain" :) But great to see discussion here!

KennethEnevoldsen Feb 21, 2025
Maintainer

Yeah, the current definition is not perfect. I plan to let this run for a few days and then take a vote at the end.

x-tabdeveloping · 2025-02-21T07:57:52Z

x-tabdeveloping
Feb 21, 2025
Collaborator

Well, to be fair, I can see why model makers, like @jxmorris12 would think that the default option is not good, and we've talked about this before, but essentially the current way it's set up, our system penalizes honesty, which really shouldn't be the case. In fact, not knowing what the training data is is the worst scenario for the user, since they have no idea where the good/bad scores come from.

I'm still of the opinion that the hard zero-shot filter should be the default. That way we:

Encourage people to not train on the benchmark datasets
Encourage honesty: model makers should disclose details about their training procedure

Especially since now we've made a bunch more new benchmarks, we should really avoid a bunch of people submitting models that will show up on the leaderboard but have a ⚠️ sign next to them because they didn't tell us what they trained on.

I also don't think allowing all is a good option, because then, to be on top of the leaderboard, model makers will have to keep training on the benchmark datasets, and the zero-shot thing will be some small thing some people might care about, but it won't incentivise the behaviour which we ultimately want (model makers training on other things so that our benchmark can measure how well their model generalizes)

1 reply

orionw Feb 21, 2025
Maintainer

+1 to this, I agree that if we don't make zero-shot the priority the leaderboard will be full of models without the zero-shot sign and people will have to scroll to the next page to even see the zero-shot.

We've done all the work to incentivize this behavior -- let's actually do it! Otherwise the benchmark will end up the same as the last one.

tomaarsen · 2025-02-21T09:45:38Z

tomaarsen
Feb 21, 2025
Collaborator

Moved this comment from #2076 to here

I agree with @bwanglzu here. The default of "Allow Unknown" has some pretty negative effects:

Much increased focus on poorer and unknown models where we simply don't know much - but they were most likely trained on at least one of the datasets. For example, in the top 15 on MMTEB, only 1 model (!) is marked as zero-shot and all 14 others are Unknown:
Extremely competent models are seemingly gone from the leaderboard because they overlap with e.g. 1% of the test datasets. This will result in people being less informed!

In my opinion, the emoji immediately draws attention, so if we update the default to "Allow all", then people will still be informed to be wary of certain models. See for example here, the Zero-shot column draws the eye pretty quickly:

Beyond that, the "Zero-shot" option is among all kinds of model options that don't need updating much, so it's very easy to miss. I've had to explain to at least 4 separate people why they can't find their favourite model on the leaderboard anymore.

In short: I think that users will be more than sufficiently informed about the zero-shot-ness of a model even if we use "Allow all" as the default, and they will be more informed than if we keep using "Allow unknown". Having said that, I do recognize the purpose of the "Allow Unknown" default - discouraging overfitting on MTEB.

Tom Aarsen

7 replies

Samoed Feb 21, 2025
Collaborator Author

If you have ideas how to do it, we also have an issue for discussion #1636

KennethEnevoldsen Feb 21, 2025
Maintainer

If we could move away from manual annotation to estimation, I would be very happy. Maybe we can test this soon. I am not sure the quantile approach would satisfy this though (would have to see some evidence in that direction).

KennethEnevoldsen Feb 21, 2025
Maintainer

Thanks, Tom for the careful overview here. I think the points are very good. Could def. be fine with a "allow all" as the default. It is after all easy for users to filter for for more strict cases if they want.

orionw Feb 21, 2025
Maintainer

I agree with @bwanglzu that it can get complicated, but I don't think it's as hard as it makes it out to be. We've had a discussion here about the definition of zero shot. If your model trains on the data (even a little) or distills from a model that trained on it then it is not zero-shot.

the number of tasks on benchmark

This just feels like one is not willing to check what is in their training data, and I don't think that is a good excuse.

@bwanglzu you say that "I would personally put more priority on more research into "how can we make embeddings better on unknow domain"" but how can we do that if everyone trains on all the unknown domains? It seems like a catch-22. In order to test on unknown domains, you have to commit to not training on the evals.

I agree with @tomaarsen's insightful comments here as well, we don't want to discourage honesty.

My vote would be to do a hard zero-shot filter on the main page, with a large banner / red text indicating that you can select models with it in their training data. And my 2nd preference is all models, but with the emojis to be clear.

This may be an academic/industry divide on that, but given that MTEB is academic I think it would be fair to lean more on the academic side -- the companies can still link to the page with all models allowed when making their "SOTA" claim.

It's inherently difficult to incentivize zero-shot since all model creators need to be on the top of the benchmark (especially when $$ is on the line) and I think we have a unique opportunity to design the positive incentive.

KennethEnevoldsen Feb 21, 2025
Maintainer

Thanks @orionw, glad to see some varied opinions here.

I wanted to add to this point here

This just feels like one is not willing to check what is in their training data, and I don't think that is a good excuse.

That there might be cases where the training data is simply "too good" to forgo. E.g. MIRACL might be such a case. However, then it also means that it is a poor evaluation dataset. It might be ideal to remove some of these similar to what we have done with MSMARCO in MTEB(eng, v2) to encourage unbiased evaluation while not removing high-quality training datasets.

Samoed · 2025-02-21T10:18:42Z

Samoed
Feb 21, 2025
Collaborator Author

I think we could also add per-task identification for models to indicate whether they are ZeroShot or not.

6 replies

tomaarsen Feb 21, 2025
Collaborator

Oh that's a very solid idea! I'm a big fan. It really shows the degree of the problem, and users can determine whether this potential overfitting affects them or not. If an English model is trained on MS MARCO and no other MTEB datasets, then the user doesn't care about that if they just want a good model for Clustering.

I then also propose the following demarcations based on your suggestions elsewhere:
✅ - Model is zero-shot on the benchmark (identical)
❓ - Training data unknown (emoji updated)
⚠️ - Model is primarily zero-shot (new, slightly overfitted perhaps: 'be warned')
❌ - Model is NOT zero-shot on the benchmark (emoji updated: 'avoid')

KennethEnevoldsen Feb 21, 2025
Maintainer

I can get on board with this. What are the boundary 3 tasks? 5-10% of tasks?

Default would then be allowed:
✅ - Model is zero-shot on the benchmark (identical)
❓ - Training data unknown (emoji updated)
⚠️ - Model is primarily zero-shot (new, slightly overfitted perhaps: 'be warned')

disallow
❌ - Model is NOT zero-shot on the benchmark (emoji updated: 'avoid')

tomaarsen Feb 21, 2025
Collaborator

I agree with the allowed vs disallowed (by default, seeing the not zero-shot models is still interesting just to show that they are not zero-shot as opposed to "not in MTEB").
And the boundary is probably best defined by diving into the data. I imagine a clear boundary between "reasonably/primarily zero-shot" and "clearly not zero-shot" should become pretty clear.

isaac-chung Feb 21, 2025
Collaborator

For ⚠️, maybe we include the number of tasks as an additional data point for users to decide for themselves based on their appetite? E.g. (5/76)
I'd say even having 1 would automatically be classified as ⚠️.

tomaarsen Feb 21, 2025
Collaborator

I agree that 1 should already make a model ⚠️, the threshold/boundary is for when a model becomes ❌ (right?)

Samoed · 2025-02-21T11:57:51Z

Samoed
Feb 21, 2025
Collaborator Author

Maybe we should change ZeroShot to Trained on? ZeroShot can have different meanings—for example, cde wouldn’t be considered ZeroShot because it adds in-context embeddings, but it also might not be explicitly Trained on the datasets.

3 replies

tomaarsen Feb 21, 2025
Collaborator

I'm not sold, Trained on gives the impression that it was trained on the test data, which should never be the case.
I think CDE can still be considered zero-shot because it does not have an "unfair advantage" by training on the training split of a test dataset - it uses an inference-time advantage that is equivalent for end users, as end users can also use contextual embeddings of whatever they are embedding.

KennethEnevoldsen Feb 21, 2025
Maintainer

Hmm that is a good question. I think the in-context embedding is something that you could compute at inference (e.g. for clustering). In which case I would call it zero-shot (but it is def. an edge case)

jxmorris12 Feb 21, 2025

Just chiming in on CDE here; @tomaarsen's explanation is exactly correct. (By the way, I like the term 'in-context embeddings'.)

sfc-gh-lmerrick · 2025-02-21T22:42:58Z

sfc-gh-lmerrick
Feb 21, 2025

The default needs to be changed as soon as possible because it is misleading leaderboard users and creates a perverse incentive against open science.

I have spoken about the "zero shot" topic before in the PR that introduced zero shot evaluation and I will include a copy of those comments below for visibility. I have some additional thoughts to share which pertain to not just introducing zero-shot but also to the current mess of the three-label situation.

The Arctic Embed models are perhaps the most laughable illustration of the problems with the new three-class annotation and default filter choice

Because we published reports on Arctic Embed 1.0 and 2.0, these models have been marked as confirmed to be partially in-domain on the original MTEB retrieval benchmark and now hidden by default on the new benchmark. Since we only issued a higher-level blogpost with our 1.5 model (a post which for brevity did not explicitly clarify that the used largely the same data as we used for our 1.0 model), this model has been labeled as "unknown data" and shows up in the default view. To a reader of the benchmark, it appears as though there is a significant difference between the data used in 1.0 and 1.5, despite this not being the case. This has a direct and negative impact on the utility of the benchmark page to readers who are trying to inform themselves about the embedding model space.

Differentiating between "unknown" and "certified in-domain" appears generally harmful to both model developers and leaderboard readers no matter what the default choice is

The new MTEB design incorrectly implies there is an important difference between the "unknown data" and "known in-domain data" models that have submitted evaluations to MTEB. In truth, the difference between these models is primarily just a measure of data availability and whether a work provides rigorous and transparent documentation to "earn" the right of being marked with an ❌ .

For example, in addition to the Arctic Embed 1.5 case, it currently appears that the Stella models also are tagged as "unknown data" despite the Stella & Jasper report discussing distillation of in-domain models as the training methodology as the backbone for both Stella and Jasper. However, only the Jasper models are marked as "known in domain" because the paper never officially clarifies that an in-domain model was used as the teacher for Stella.

As long as the differentiation persists, MTEB is creating a perverse incentive against open science. The only way this could be flipped around is if some punitive treatment given to "unknown data" submissions that is not given to "known in domain" submissions, but doing so seems like it would again confuse readers of the leaderboard more than it would help them.

Comments from Jan 7:

Giving my 2c as an author of Arctic Embed:

* I agree that it is frustrating to see a bunch of retrieval models topping the leaderboard disclosing that they trained on the training set of nearly half the datasets in the benchmark and have no way of knowing if they're actually better or if they're just overfitting to the evaluation
  
  * We have seen pretty compelling signs of benchmark overfitting in our work on multilingual embedding models, so it's definitely a very real issue to run truly out-of-distribution evaluation -- we saw a lot of potential overfitting to the MIRACL benchmark in our [Arctic Embed 2.0 paper](https://arxiv.org/abs/2412.04506), which contains evaluations in which competitive open-weights models perform much less competitively on the little-known CLEF benchmark which was not used in their training
  * Along this line, I really liked what the [Google Gecko paper did](https://arxiv.org/abs/2403.20327) by testing a "zero shot" variant and including those results in their paper

* The high volune of changes to the `mteb` Python package has also been endlessly frustrating, though, and I would argue somewhat unjustified
  
  * My team can run the same evals we set up in May on our current internal codebase just fine, because we care about this kind of thing, and it makes our lives better. Every couple months we have had to start over basically from zero when it comes to understanding and running the official `mteb` evalaution scripts, despite the datasets and metrics staying the same. This feels quite unnecessary.

* I would really appreciate a stronger commitment to backwards compatibility and stability across the mteb benchmark
  
  * If you want a new certified-zero-shot benchmark, one option is to _create a new benchmark_ (e.g. a "zero shot" tab, just like you have for non-English MTEB).
  * If you really want to avoid overfitting, the best solution is to not publicly disclose or make available the evaluation data. This is a much more reliable strategy used by Kaggle competitions, ARC-AGI, and other competitive benchmarks

2 replies

KennethEnevoldsen Feb 23, 2025
Maintainer

Thanks for raising your concern @sfc-gh-lmerrick. We do value it and while I understand that it is influential for your business this issue is a place to allow everyone's opinion to be heard and reach a solid comprise. While we want to process to be effective it will take some time.

I will however ask you to refrain from making demands on open-source projects (you would have to pay us to do that). We appreciate a well-argued point. Your previous point on backward compatibility is a point that we have heard and that has played a role in PRs.

Regarding the Stella models: As far as I can tell from the report the only model released from it is the Jasper model the Stella model is simply used as the base. If I missed something please do let me know. In fact for all model annotations, we do not claim that they are perfect but encourage correction where relevant. You or your team can make corrections either through issues or PRs.

sfc-gh-lmerrick Feb 24, 2025

Hi Kenneth, sorry for coming across as demanding work from the awesome team maintaining the project -- this was by no means my intention! You all are doing a great job building awesome features like the latest visualizations and drilldowns added to the leaderboard page, and I would not presume to try and dictate your roadmap to you.

My intent with this comment is to a call for a sense of urgency around fixing the default value due to the negative impact that the current setting has on model consumers and open-science-embracing model developers. My interpretation of this discussion was that folks are looking for approval from the maintainers to submit a fix more than asking for them to do additional work, and I am happy to volunteer to dig in and contribute a PR myself.

I should also note that I am speaking for myself and not my employer. The opinions I have shared are solely my own. I hope these opinions and observations are helpful to the broader community.

x-tabdeveloping · 2025-02-24T08:07:04Z

x-tabdeveloping
Feb 24, 2025
Collaborator

I don't think I'm the person who should make the last call on this, but to me it seems that the general sentiment indicates that keeping the current default is the worst thing we could do.
I believe that our ultimate goal with zero-shot filter is to a) encourage model makers not to train on the benchmark datasets, and b) be transparent about their training procedure.

As far as I can tell, everyone in the above discussion values honesty more than being zero-shot, and I'm inclined to agree.
As such our priority should be to somehow penalize dishonesty, which we are currently not doing (you can see models with ⚠️ marks) but penalizing honesty (you cannot see models with ❌ marks).

I think there are a number of steps we can take to get to a reasonable compromise:

We should make allow all the default option
We should make it visually clear what the absolute worst option is: lack of transparency. As such I propose that we drop the ❌ mark and instead follow a scheme like this:
- 🔶 - not zero shot, and we know it
- 🟢 - zero-shot
- ⚠️ - We have no idea
  This way, we make it clear that we want people to produce zero-shot models, but we think transparency is a much more burning issue.
We should put the zero-shot filter into a more visually significant place in the interface. What I mean here is that if it's shoved up in model properties, people won't notice, even though it's something the MTEB team deeply cares about. We should do a better job of explaining why this is important, and we should, if not force, but encourage people to use the zero-shot filter. We can even add an option where people only see models, for which we have information about the training data and procedure.

Lmk what you think. @tomaarsen @KennethEnevoldsen @orionw @Muennighoff

2 replies

KennethEnevoldsen Feb 24, 2025
Maintainer

I like this - encourages openness without over-punishing non-zero-shot. I do still think we should but a degree of not-zero-shot by adding the number afterwards 🔶(1), vs 🔶(56) and potentially filter the worse cases.

🟢 : Zero-shot
🔶(1) : Not zero-shot, but the number helps with the extent. Yellow might be hard to interpret so maybe 🔴 is easier also puts it visually in the same bucket as unknown
❓ : Unkown (clearly indicates that it is unknown) - warning might be too harsh since it can still be a valid model

x-tabdeveloping Feb 24, 2025
Collaborator

Perhaps we could do a percentage instead of a number. Like instead of saying how many tasks we say how much of the benchmark they've trained on.

KennethEnevoldsen · 2025-02-24T15:02:20Z

KennethEnevoldsen
Feb 24, 2025
Maintainer

So after some discussion with @x-tabdeveloping we came to this as the conclusion (While the coloring and formatting are not a 100% set)

100%: 100% zero-shot, no leakage
53%: Only 53% percent of the dataset is this model zero-shot.
⚠️: denote data is not available

This would come with a default show all and have the following options:

We think that this will:

encourages open data
correctly conveys model issues to the user
removes the current incentive not to disclose data

The only issue we currently see is that some models will appear at the top which we would not recommend (e.g. zero-shot < 0.80). We could consider filtering these out. The worst offenders are on the MTEB(eng, v1):

(note that the voyage exp and nvidia-embed were missing some annotation which I have fixed after this screenshot was taken)

And there does seem to be a performance benefit from overfitting:

Marton Kardos (@x-tabdeveloping) and Kenneth Enevoldsen

4 replies

Muennighoff Feb 24, 2025
Maintainer

This looks good to me, great idea! If we go with this, maybe we can add the formula somewhere in the explanation dropdown i.e. I guess for a given benchmark it is # num benchmark datasets trained on / # total benchmark datasets

sfc-gh-lmerrick Feb 24, 2025

This looks like a great set of changes!

I do notice that the draft color coding does make the zero shot task percentage field seem more important visually than other fields like the size of the model, embedding dimensions, etc., and I am curious if this is an intentional design choice or not.

If increasing the visibility of this field is not an intended effect, some tricks that might work to reduce the eye-popping effect are:

coloring the text rather than the background
lowering the saturation of the background colors (an HSL color picker might be a cool tool to play with here)

x-tabdeveloping Feb 25, 2025
Collaborator

Making it visually stand out was somewhat of a conscious choice, since I personally think this is something people should care about.
On the other hand I'm more than open to discuss using text color instead of background if we see fit.

x-tabdeveloping Feb 25, 2025
Collaborator

I already added an explanation on the branch @Muennighoff but adding an explicit formula might be a good idea too :D

x-tabdeveloping · 2025-02-25T12:10:09Z

x-tabdeveloping
Feb 25, 2025
Collaborator

The new system is now live on the leaderboard.
Let me know if I can do anything more to make this less confusing.

0 replies

ZeroShot default value #2119

Samoed Feb 20, 2025 Collaborator

Replies: 10 comments · 28 replies

jxmorris12 Feb 20, 2025

KennethEnevoldsen Feb 20, 2025 Maintainer

KennethEnevoldsen Feb 20, 2025 Maintainer

bwanglzu Feb 21, 2025

KennethEnevoldsen Feb 21, 2025 Maintainer

x-tabdeveloping Feb 21, 2025 Collaborator

orionw Feb 21, 2025 Maintainer

tomaarsen Feb 21, 2025 Collaborator

Samoed Feb 21, 2025 Collaborator Author

KennethEnevoldsen Feb 21, 2025 Maintainer

KennethEnevoldsen Feb 21, 2025 Maintainer

orionw Feb 21, 2025 Maintainer

KennethEnevoldsen Feb 21, 2025 Maintainer

Samoed Feb 21, 2025 Collaborator Author

tomaarsen Feb 21, 2025 Collaborator

KennethEnevoldsen Feb 21, 2025 Maintainer

tomaarsen Feb 21, 2025 Collaborator

isaac-chung Feb 21, 2025 Collaborator

tomaarsen Feb 21, 2025 Collaborator

Samoed Feb 21, 2025 Collaborator Author

tomaarsen Feb 21, 2025 Collaborator

KennethEnevoldsen Feb 21, 2025 Maintainer

jxmorris12 Feb 21, 2025

sfc-gh-lmerrick Feb 21, 2025

The Arctic Embed models are perhaps the most laughable illustration of the problems with the new three-class annotation and default filter choice

Differentiating between "unknown" and "certified in-domain" appears generally harmful to both model developers and leaderboard readers no matter what the default choice is

KennethEnevoldsen Feb 23, 2025 Maintainer

sfc-gh-lmerrick Feb 24, 2025

x-tabdeveloping Feb 24, 2025 Collaborator

KennethEnevoldsen Feb 24, 2025 Maintainer

x-tabdeveloping Feb 24, 2025 Collaborator

KennethEnevoldsen Feb 24, 2025 Maintainer

Muennighoff Feb 24, 2025 Maintainer

sfc-gh-lmerrick Feb 24, 2025

x-tabdeveloping Feb 25, 2025 Collaborator

x-tabdeveloping Feb 25, 2025 Collaborator

x-tabdeveloping Feb 25, 2025 Collaborator

Samoed
Feb 20, 2025
Collaborator

Replies: 10 comments 28 replies

jxmorris12
Feb 20, 2025

KennethEnevoldsen
Feb 20, 2025
Maintainer

KennethEnevoldsen Feb 20, 2025
Maintainer

KennethEnevoldsen Feb 21, 2025
Maintainer

x-tabdeveloping
Feb 21, 2025
Collaborator

orionw Feb 21, 2025
Maintainer

tomaarsen
Feb 21, 2025
Collaborator

Samoed Feb 21, 2025
Collaborator Author

KennethEnevoldsen Feb 21, 2025
Maintainer

KennethEnevoldsen Feb 21, 2025
Maintainer

orionw Feb 21, 2025
Maintainer

KennethEnevoldsen Feb 21, 2025
Maintainer

Samoed
Feb 21, 2025
Collaborator Author

tomaarsen Feb 21, 2025
Collaborator

KennethEnevoldsen Feb 21, 2025
Maintainer

tomaarsen Feb 21, 2025
Collaborator

isaac-chung Feb 21, 2025
Collaborator

tomaarsen Feb 21, 2025
Collaborator

Samoed
Feb 21, 2025
Collaborator Author

tomaarsen Feb 21, 2025
Collaborator

KennethEnevoldsen Feb 21, 2025
Maintainer

sfc-gh-lmerrick
Feb 21, 2025

KennethEnevoldsen Feb 23, 2025
Maintainer

x-tabdeveloping
Feb 24, 2025
Collaborator

KennethEnevoldsen Feb 24, 2025
Maintainer

x-tabdeveloping Feb 24, 2025
Collaborator

KennethEnevoldsen
Feb 24, 2025
Maintainer

Muennighoff Feb 24, 2025
Maintainer

x-tabdeveloping Feb 25, 2025
Collaborator

x-tabdeveloping Feb 25, 2025
Collaborator

x-tabdeveloping
Feb 25, 2025
Collaborator