Can I run ppo in llama3.1-70B-instruct? #6

cingtiye · 2024-11-05T02:18:16Z

No description provided.

PeterSH6 · 2024-11-05T06:30:30Z

Absolutely!
You can create your own llama training scripts similar to run_deepseek_full_hh_rlhf.sh to work with any models supported in hugging face & vllm.
To do this, simply replace the actor_rollout_ref.model.path, critic.model.path, and reward_model.model.path to your llama model checkpoint.
For training 70B models, we recommend using Megatron-LM backend with 64 GPUs (e.g., trainer.n_gpus_per_node=8, trainer.nnodes=8)
For other configs to construct the training script, please refer to our document Config Explaination

Disclaimer: Due to the llama license restrictions, we're unable to provide examples in our repo using Llama checkpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I run ppo in llama3.1-70B-instruct? #6

Can I run ppo in llama3.1-70B-instruct? #6

cingtiye commented Nov 5, 2024

PeterSH6 commented Nov 5, 2024

Can I run ppo in llama3.1-70B-instruct? #6

Can I run ppo in llama3.1-70B-instruct? #6

Comments

cingtiye commented Nov 5, 2024

PeterSH6 commented Nov 5, 2024