We provide a lightweight implementation of the PPO finetuning performed in "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning". We use LoRA through the Peft library for lightweight finetuning.
We leverage Lamorel's custom modules and updaters to add a value head on top of the LLM and finetune all the weights using the PPO loss. Finally, using Lamorel's initializer, we add LoRA's adapters to the LLM (which are then automatically synchronized by Lamorel if multiple LLM instances are deployed).
- Install BabyAI-Text environment
- Install required packages:
pip install -r requirements.txt
To launch the example using a single GPU on a local machine:
- Spawn both processes (RL collecting data and LLM):
python -m lamorel_launcher.launch \
--config-path PROJECT_PATH/examples/PPO_finetuning/ \
--config-name PROJECT_PATH/examples/PPO_finetuning/local_gpu_config \
rl_script_args.path=PROJECT_PATH/examples/PPO_finetuning/main.py \
rl_script_args.output_dir=YOUR_OUTPUT_DIR \