OpenRLHF

A Ray-based High-performance RLHF framework!

Open-source ChatGPT / Comprehensive / Lightweight / Easy-to-use

The code is open-source, feel free to use it, contributions are welcome! Note: The license of the model depends on the provider of the model.

💫OpenRLHF
💫Features
💫Performance
📄Running Example
⛏️Pull Request
💐References & Acknowledgements
🌟Sponsor Us
🌈Starchart
🏆Contributors

OpenRLHF Project

OpenRLHF aims to develop a High-performance RLHF training framework based on Ray and DeepSpeed. OpenRLHF is the Simplest high-performance RLHF library that supports 34B models RLHF training with Single DGXA100 (script)).

The key idea of OpenRLHF is to distribute the Actor Model, Reward Model, Reference Model, and the Critic Model onto separate GPUs using Ray, while placing the Adam Optimizer on the CPU. This enables full-scale fine-tuning of 7B models across multiple 24GB RTX 4090 GPUs (or 34B models with multiple A100 80G), with high training efficiency thanks to the ability to use a large generate batch size with Adam Offload and Ray. Our PPO performance with the 13B llama2 models is 4 times that of DeepSpeedChat.

Features

A fast LLaMA2 SFT/PPO Training Framework based on DeepSpeed.
Multi-nodes training scripts for Slurm.
Support DPO (direct-preference-optimization).
Distributed PPO based on Ray for 34B+ models and 7B models on RTX4090.
Support Decision Transformer (DT) Alignment (https://arxiv.org/abs/2308.12050).
Support top chinese models.
Support Wandb log (--wandb).
Support conda env/nvidia docker.
Support FlashAttention2 (--flash_attn).
Pre-trained 7B/13B SFT/RM/DPO/PPO/DT checkpoint
PPO vs SFT examples

TODO

Support checkpoint.
Support Rejection Sampling.
Support Multiple Reward models.
Support QLora.

Support Matrix

	PPO-max & Best Hyperparameters	Ray	34B Full Tuning with 4 A100	7B Full Tuning with 1 A100 (80G)	7B Full Tuning with 4 RTX4090
OpenRLHF	✔	✔	✔	✔	✔
DeepSpeedChat	✖️	✖️	✖️	✖️	✖️
ColossalAIChat	✖️	✖️	✖️	✖️	✖️
TRL	✖️	✖️	✖️	✖️	✖️

Performance

	7B llama2 RLHF	13B llama2 RLHF (50k samples)
OpenRLHF	-	22 hours with 8 A100
DeepSpeedChat	-	48 hours with 16 A100

Ray/DeepSpeed Config:

4 A100 80G for Actor, 2 A100 80G for Critic, 1 A100 80G for RM, and 1 A100 80G for InitPolicy
ZeRO2 with Adam Offload
Max Sequence Length: 2048

Throughput:

7B llama2: 0.105 samples/gpu/secs
- micro_batch_size = 16/8 (rollout/train), generation_length = 100~300
13B llama2: 0.04 samples/gpu/secs
- micro_batch_size = 8/4 (rollout/train), generation_length = 200~400
34B codellama: 0.007 samples/gpu/secs
- micro_batch_size = 2/1 (rollout/train), generation_length = 300~800

samples/gpu/secs = Number of PPO Samples / Number of A100 GPUS / Seconds

Running Example

You can build openrlhf from nvidia-docker(recomended) or from conda envs.

Clone the repository: 
git clone https://github.com/openllmai/OpenRLHF.git

# Download the pre-trained SFT/RM checkpoints (Optional)
git lfs install
git clone https://huggingface.co/chuyi777/openrlhf_checkpoint

Single-node training with nvidia-docker

cd examples/scripts

# install nvidia-docker (Optional)
./nvidia_docker_install.sh

# launch nvidia container
./docker_run.sh

# cd in container
cd /openrlhf/examples/scripts

# build OpenRLHF (i.e, pip install)
./build_openrlhf.sh

# huggingface login 
~/.local/bin/huggingface-cli login

# train SFT model
./train_sft_llama.sh

# train RM model
./train_rm_llama.sh

# train PPO model
./train_ppo_llama.sh

# train DPO model
./train_dpo_llama.sh

# train Decision Transformer model
./train_decision_transformer_llama.sh

PPO training with Ray

for 13B/34B models on A100/H100.. or 7B models on RTX4090

cd examples/scripts

# launch nvidia container
./docker_run.sh

# cd in container
cd /openrlhf/examples/scripts

# build OpenRLHF (i.e, pip install)
./build_openrlhf.sh

# huggingface login 
~/.local/bin/huggingface-cli login

# launch ray in container
nohup ray start --head --node-ip-address 0.0.0.0 --num-gpus 8 --block &> ray.log &

# if you want to launch ray on more nodes, use
# ray start --address {MASTER-NODE-ADDRESS}:6379  --num-gpus 8 --block

# train ray PPO model, requires 8 gpus in default config
./train_ppo_llama_ray.sh

Multi-nodes training on Slurm

cd examples/scripts

# huggingface login on Slurm 
pip install transformers
huggingface-cli login

# Moidfy the Slurm Account/Nodes ... in `train_llama_slurm.sh`

# For SFT, RM, and PPO training stage:
# Modify the variable `training_script` in `train_llama_slurm.sh` to
readonly training_script="train_sft_llama.sh"
readonly training_script="train_rm_llama.sh"
readonly training_script="train_ppo_llama.sh"

# set `GPUS_PER_NODE` in `train_llama_slurm.sh`
readonly GPUS_PER_NODE=8

# run multi-nodes training script
# train_llama_slurm.sh will load the training args from `training_script`
sbatch ./train_llama_slurm.sh

build openrlhf from conda envs

If you really don't want to use nvidia-docker, we also provide tutorials for building openrlhf from a conda environment. (We prefer nvidia-docker to avoid errors caused by the environment.)

# we need conda
conda create -n openrlhf python=3.10
# so, we need install some package manualy: when installing torch, you may need to match the corresponding cuda version.
pip install packaging ninja
pip install torch --index-url https://download.pytorch.org/whl/cu118
# check ninjia
ninja --version
echo $? # output: 0
# install flash-attn: may take some time.
# For network error: you can download sepecified version from https://github.com/Dao-AILab/flash-attention/releases.
pip install flash-attn==2.1.1 --no-build-isolation
./build_openrlhf.sh
# enjoy it!

Inference and Evaluation

After completing the training, you can evaluate your model by using the inference script:

# interactive_chat
./interactive_chat_llama.sh { model_path }

# batch generate
python examples/batch_inference.py {args}

Pull Request

If you want to contribute code please format the code using the following command,

pip install pre-commit
pre-commit install
git add .
git commit -m "xxx"

References & Acknowledgements

We would like to express our gratitude to the following projects and organizations for their contributions to the field of AI and NLP:

Hugging Face Transformers ↗
OpenAI GPT ↗
LLaMA2 ↗
DeepSpeed ↗
Ray ↗

Join Us

How to Join?

Email us at xianyuai@openllmai.top(official email) or janhu9527@gmail.com/jjgxw@outlook.com(PIC). Please include the following details:
- Your name
- Your GitHub username
- Your areas of interest
- Your skills and experience related to NLP and/or AI
You can also join us through the official GitHub OpenRLHF ↗ project page. Just create an issue about your interest to contribute and we will get back to you.

What can you do?

Join the team and participate in the development of the OpenRLHF project.
Contribute to the project by submitting pull requests.
Help improve documentation, fix bugs, or create new features.
Share the project and help us grow the community.

Sponsor Us

Your sponsorship can help us maintain and improve OpenRLHF. If you find this project useful, please consider sponsoring us. You can sponsor us on Open Collective ↗.

Starchart

Contributors

A big thank you to all our contributors! If you want to contribute, feel free to make a pull request or create an issue.

Citation

@misc{openllmai23,
   author = {OpenLLMAI},
   title = {OpenRLHF},
   year={2023},
   howpublished = {\url{https://github.com/OpenLLMAI/OpenRLHF}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OpenRLHF

OpenRLHF

A Ray-based High-performance RLHF framework!

OpenRLHF Project

Features

Performance

Running Example

Pull Request

References & Acknowledgements

Join Us

Sponsor Us

Starchart

Contributors

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

OpenRLHF

OpenRLHF

A Ray-based High-performance RLHF framework!

OpenRLHF Project

Features

Performance

Running Example

Pull Request

References & Acknowledgements

Join Us

Sponsor Us

Starchart

Contributors

Citation