-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add E2B code interpreter reward function #364
Merged
Changes from 34 commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
e0decfd
Add stuff
lewtun a290599
Merge branch 'main' into grpo-code
lewtun 7f4e8a3
Merge branch 'main' into grpo-code
lewtun da19783
Make it kind of work
lewtun 6ba5302
Add more stuff
lewtun f8e200e
Merge branch 'main' into grpo-code
lewtun 78cf722
Add fix for parse
lewtun 24dc34f
Fix
lewtun 22244fe
Refactor
lewtun c32d137
Clean up
lewtun dab15e0
Fix config
lewtun edc502d
Fix sys
lewtun 27af68e
Add SFT config
lewtun 53eaddb
Use min rate
lewtun 385d799
Fix eval
lewtun 52fc681
Add base model
lewtun 884387f
Add s1k
lewtun 2d3c797
Disable eval
lewtun f85b7b7
Merge branch 'main' into grpo-code
lewtun aaa8f6f
Fix
lewtun 20a1ea0
Add import checker
lewtun 5863303
Fix importer
lewtun 8d78b8e
Fix
lewtun 932e69e
Tune config
lewtun 258406f
Tune
lewtun fd9860e
Fix
lewtun c614dbd
Fix save
lewtun 51815b2
Tuen beta
lewtun 21c9859
Merge branch 'main' into grpo-code
lewtun da08407
Remove configs
lewtun 5f35a61
Fix vLLM
lewtun 93254b4
Fix
lewtun 853e42b
Add note
lewtun 23dfafd
Add doc
lewtun 65c44d8
doc
lewtun 04381ca
Fix
lewtun fb6e4ae
Tune lr
lewtun 89ded43
Add command
lewtun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Model arguments | ||
model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct | ||
model_revision: main | ||
torch_dtype: bfloat16 | ||
attn_implementation: flash_attention_2 | ||
|
||
# Data training arguments | ||
dataset_name: open-r1/verifiable-coding-problems-python-10k | ||
dataset_configs: | ||
- default | ||
system_prompt: "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>" | ||
|
||
# GRPO trainer config | ||
callbacks: | ||
- push_to_hub_revision | ||
beta: 0.01 | ||
bf16: true | ||
use_vllm: true | ||
vllm_device: auto | ||
vllm_gpu_memory_utilization: 0.9 | ||
do_eval: false | ||
gradient_accumulation_steps: 4 | ||
gradient_checkpointing: true | ||
gradient_checkpointing_kwargs: | ||
use_reentrant: false | ||
hub_model_id: Qwen2.5-1.5B-Open-R1-Code-GRPO | ||
hub_strategy: every_save | ||
learning_rate: 5.0e-07 | ||
log_completions: true | ||
log_level: info | ||
logging_first_step: true | ||
logging_steps: 1 | ||
logging_strategy: steps | ||
lr_scheduler_type: cosine_with_min_lr | ||
lr_scheduler_kwargs: | ||
min_lr_rate: 0.1 | ||
max_prompt_length: 1024 | ||
max_completion_length: 2048 | ||
max_steps: 500 | ||
num_generations: 14 | ||
num_train_epochs: 1 | ||
output_dir: data/Qwen2.5-1.5B-Open-R1-Code-GRPO | ||
overwrite_output_dir: true | ||
per_device_train_batch_size: 16 | ||
push_to_hub: true | ||
report_to: | ||
- wandb | ||
reward_funcs: | ||
- code | ||
- format | ||
reward_weights: | ||
- 1.0 | ||
- 0.1 | ||
save_strategy: "steps" | ||
save_steps: 50 | ||
save_total_limit: 1 | ||
seed: 42 | ||
temperature: 1.0 | ||
warmup_ratio: 0.03 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
from .import_utils import is_e2b_available | ||
from .model_utils import get_tokenizer | ||
|
||
|
||
__all__ = ["get_tokenizer"] | ||
__all__ = ["get_tokenizer", "is_e2b_available"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Copyright 2025 The HuggingFace Team. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from transformers.utils.import_utils import _is_package_available | ||
|
||
|
||
# Use same as transformers.utils.import_utils | ||
_e2b_available = _is_package_available("e2b") | ||
|
||
|
||
def is_e2b_available() -> bool: | ||
return _e2b_available |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising that you don't have any issue with the extra indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually use
textwrap.dedent
in this case, but it might not be necessary here for some reason.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.python.org/3/library/textwrap.html#textwrap.dedent