Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I run ppo in llama3.1-70B-instruct? #6

Open
cingtiye opened this issue Nov 5, 2024 · 1 comment
Open

Can I run ppo in llama3.1-70B-instruct? #6

cingtiye opened this issue Nov 5, 2024 · 1 comment

Comments

@cingtiye
Copy link

cingtiye commented Nov 5, 2024

No description provided.

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Nov 5, 2024

Absolutely!
You can create your own llama training scripts similar to run_deepseek_full_hh_rlhf.sh to work with any models supported in hugging face & vllm.
To do this, simply replace the actor_rollout_ref.model.path, critic.model.path, and reward_model.model.path to your llama model checkpoint.
For training 70B models, we recommend using Megatron-LM backend with 64 GPUs (e.g., trainer.n_gpus_per_node=8, trainer.nnodes=8)
For other configs to construct the training script, please refer to our document Config Explaination

Disclaimer: Due to the llama license restrictions, we're unable to provide examples in our repo using Llama checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants