-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPQA Diamond and fix evaluation deps #196
Conversation
export LD_LIBRARY_PATH=$(python -c "import site; print(site.getsitepackages()[0] + '/nvidia/nvjitlink/lib')"):$LD_LIBRARY_PATH | ||
``` | ||
|
||
This will also install PyTorch `v2.5.1` and it is **very important** to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via `pip install -e .[LIST OF MODES]`. For most contributors, we recommend: | ||
|
||
```shell | ||
pip install -e ".[dev]" | ||
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]" --link-mode=copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed because uv
cannot install lighteval
otherwise due to some LFS file conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had this issue, I had reverted back to pip, glad you fixed it.
lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \ | ||
--custom-tasks src/open_r1/evaluate.py \ | ||
--use-chat-template \ | ||
--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for the DeepSeek models (gives ~1 point gain if included)
setup.py
Outdated
@@ -53,17 +53,17 @@ | |||
"huggingface-hub[cli]>=0.19.2,<1.0", | |||
"isort>=5.12.0", | |||
"liger_kernel==0.5.2", | |||
"lighteval @ git+https://github.com/huggingface/lighteval.git@0e462692436e1f0575bdb4c6ef63453ad9bde7d4#egg=lighteval[math]", | |||
"math-verify>=0.3.3", # Used for math verification in grpo | |||
"lighteval @ git+https://github.com/huggingface/lighteval.git@3c9b0c9dde6718b23ef5b0f4960355f0d494bdfc#egg=lighteval[math]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump to latest commit once vllm fix for DDP is merged: huggingface/lighteval#541
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done it's 86f62259f105ae164f655e0b91c92a823a742724
* Add GPQA Diamond * Add table * Fix README * Up * Fixes * Ignore logs * Fix * Pin deps * Fix GRPO * Add Llama 70B tabels * Restore dp * Pin lighteval * Use bfloat16 * Tune table * Add note
Adds GPQA diamond and various important fixes for evaluation (parsing & incompat between latest
vllm
andlighteval
). I've also unified the Slurm scripts for evaluation so we don't have multiple ways to eval models.TODO