lamorel/examples/RLHF-like_PPO_LoRA_finetuning at main · flowersteam/lamorel

History

Name		Name	Last commit message	Last commit date
parent directory ..
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
local_gpu_config.yaml		local_gpu_config.yaml
main.py		main.py
repeat_env.py		repeat_env.py
requirements.txt		requirements.txt

README.md

Context

We provide a simple example of PPO finetuning on a LLM which is asked to generate specific token sequences. As in RLHF, each token is a different action.

We use LoRA through the Peft library for lightweight finetuning. We leverage Lamorel's custom modules and updaters to add a value head on top of the LLM and finetune all the weights using the PPO loss.

Installation

1.Install required packages: pip install -r requirements.txt

Launch

To launch the example using a single GPU on a local machine:

Spawn both processes (RL collecting data and LLM):

python -m lamorel_launcher.launch \
       --config-path PROJECT_PATH/examples/PPO_finetuning/ \ 
       --config-name PROJECT_PATH/examples/PPO_finetuning/local_gpu_config \
       rl_script_args.path=PROJECT_PATH/examples/PPO_finetuning/main.py \
       rl_script_args.output_dir=YOUR_OUTPUT_DIR \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLHF-like_PPO_LoRA_finetuning

RLHF-like_PPO_LoRA_finetuning

README.md

Context

Installation

Launch

Files

RLHF-like_PPO_LoRA_finetuning

Directory actions

More options

Directory actions

More options

Latest commit

History

RLHF-like_PPO_LoRA_finetuning

Folders and files

parent directory

README.md

Context

Installation

Launch