-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gauntlet v0.1 #674
Gauntlet v0.1 #674
Conversation
… into model_gauntlet_v0.1
… into model_gauntlet_v0.1
…ry into human_eval_simple
…y into execution_prediction
There are a lot of yamls in |
README.md says "This is version v0, in the coming weeks we will update the mixture to include more benchmarks." We should list it as v0.1.0 and not promise to add things so quickly lol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional changes look fine to me. Leaving to others to approve the gauntlet itself/dataset descriptions
LGTM |
This reverts commit ab5577b.
This PR introduces v0.1 of the Pre-training Gauntlet. We introduce chain-of-thought QA tasks, as well as 16 new benchmarks, and a new Safety category.
Test 7B models on 8 A100 80GB:
gauntlet-v0-1-cfXQE4
without programming. Run time is 10,000 seconds.Original gauntlet on same hardware:
mpt-eval-UBBxvo
. Run time is 5176.9 seconds