You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Absolutely!
You can create your own llama training scripts similar to run_deepseek_full_hh_rlhf.sh to work with any models supported in hugging face & vllm.
To do this, simply replace the actor_rollout_ref.model.path, critic.model.path, and reward_model.model.path to your llama model checkpoint.
For training 70B models, we recommend using Megatron-LM backend with 64 GPUs (e.g., trainer.n_gpus_per_node=8, trainer.nnodes=8)
For other configs to construct the training script, please refer to our document Config Explaination
Disclaimer: Due to the llama license restrictions, we're unable to provide examples in our repo using Llama checkpoint.
No description provided.
The text was updated successfully, but these errors were encountered: