PPOTrainer + LoRA and Continued Training #2707
Labels
⏳ needs more info
Additional information or clarification is required to proceed
⚡ PEFT
Related to PEFT
🏋 PPO
Related to PPO
Hi all,
So, currently, I’m training a model with PPOTrainer and Lora.
When I do
It saves both an
adapter_model.safetensors
and apytorch_model.bin
What is the difference between the two. There are both in the same file directory, but it seems when I load a model via from_pretrained it utilizes the adapter_model.
Does the
pytorch_model.bin
also have the lora adapters merged?Additionally, I also want to do continue PPO training from a checkpoint. I load the checkpoint similarly like this and also
and also directly load the model parameters of the
v_head
andpretrained_model
like thisas well as the optimizer. One way I've trained to load the model weight was loading the
state_dict
of theadapter_model
. However, it’s missing keys for thev_head
since it’s just a Lora adapter. How can I verify that training from the checkpoint is resuming properly with LoRA?I am using versions
The text was updated successfully, but these errors were encountered: