Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPQA Diamond and fix evaluation deps #196

Merged
merged 18 commits into from
Feb 6, 2025
Merged

Add GPQA Diamond and fix evaluation deps #196

merged 18 commits into from
Feb 6, 2025

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Feb 5, 2025

Adds GPQA diamond and various important fixes for evaluation (parsing & incompat between latest vllm and lighteval). I've also unified the Slurm scripts for evaluation so we don't have multiple ways to eval models.

TODO

@lewtun lewtun mentioned this pull request Feb 5, 2025
@lewtun lewtun changed the title [WIP] Add GPQA Diamond Add GPQA Diamond and fix evaluation deps Feb 6, 2025
@lewtun lewtun marked this pull request as ready for review February 6, 2025 13:06
export LD_LIBRARY_PATH=$(python -c "import site; print(site.getsitepackages()[0] + '/nvidia/nvjitlink/lib')"):$LD_LIBRARY_PATH
```

This will also install PyTorch `v2.5.1` and it is **very important** to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via `pip install -e .[LIST OF MODES]`. For most contributors, we recommend:

```shell
pip install -e ".[dev]"
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]" --link-mode=copy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed because uv cannot install lighteval otherwise due to some LFS file conflict

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had this issue, I had reverted back to pip, glad you fixed it.

lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
--custom-tasks src/open_r1/evaluate.py \
--use-chat-template \
--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for the DeepSeek models (gives ~1 point gain if included)

@lewtun lewtun requested a review from edbeeching February 6, 2025 13:20
setup.py Outdated
@@ -53,17 +53,17 @@
"huggingface-hub[cli]>=0.19.2,<1.0",
"isort>=5.12.0",
"liger_kernel==0.5.2",
"lighteval @ git+https://github.com/huggingface/lighteval.git@0e462692436e1f0575bdb4c6ef63453ad9bde7d4#egg=lighteval[math]",
"math-verify>=0.3.3", # Used for math verification in grpo
"lighteval @ git+https://github.com/huggingface/lighteval.git@3c9b0c9dde6718b23ef5b0f4960355f0d494bdfc#egg=lighteval[math]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump to latest commit once vllm fix for DDP is merged: huggingface/lighteval#541

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done it's 86f62259f105ae164f655e0b91c92a823a742724

@lewtun lewtun merged commit cec57f3 into main Feb 6, 2025
1 check passed
@lewtun lewtun deleted the lewtun/add-gpqa-cmd branch February 6, 2025 14:24
GitMonkey0 pushed a commit to GitMonkey0/open-r1 that referenced this pull request Feb 24, 2025
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants