-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug when loading local peft model #342
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Opdoop
Thanks for the PR, I don't know if this is the right fix.
This block: https://github.com/lvwerra/trl/blob/c85cdbdbd0d956bbc6f5a8e04b0036648480e7a1/trl/models/modeling_base.py#L153 should check the presence or not of the adapter model and load it properly using the adapter config.
Can you share a minimal script to reproduce your bug? I suspect you might be loading/saving your model not correctly
Hi @younesbelkada, thanks for your reply. It may have a chance I using trl in the wrong way. In my case, lora weight and base model weight are saved in different location, which lead following line break. A minimal script is below.
where I pass the relative path of my local fine-tuned model. The if block in line 153 is running fine. It correctly loads lora weight and finds the corresponding base model weight. That is So the following loading code worked correctly: Then, it will enter the if branch of resume_training, as the Thus, I think adding Looking forward to your response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining, this is much clearer!
I managed to reproduce the issue, it appears that the issue happens only if you give a path to a local peft model that has been not saved with AutoModelForCausalLMWithValueHead
As you can see here: https://github.com/lvwerra/trl/blob/dec9993129f9b77cb81c46f2662119a824b4f6cc/tests/test_peft_models.py#L128 there is a test that checks if loading a model that has been saved using peft works correctly. However this assumes you have saved the model with AutoModelForCausalLMWithValueHead.save_pretrained()
, thus saves the v_head inside pytorch_model.bin
file.
I think the right fix is to add a try/except
check starting from the line 218 to 220 to assert if the file ***.index.json
does exists or not, then assign a new boolean is_resuming_training
that will be used. If is_resuming_training
execute the lines 222-229, else not execute these lines and assume we are not continuing training
It would be also great if you can a test similarly as https://github.com/lvwerra/trl/blob/dec9993129f9b77cb81c46f2662119a824b4f6cc/tests/test_peft_models.py#L128 , but we can do it in a follow up PR!
Thanks again for spotting this
Hi, @younesbelkada . Thanks for your quick response!!! I try to initialize the lora weight using a finetuned lora before ppo training. I think it may help the training of ppo. (I'm not sure. What do you think?) |
Thanks for iterating! |
1. Implement the fix logic described in huggingface#342 (review) 2. Set peft lora weight to trainable.
Fix loading bug when load lora model but not resuming training 1. Implement the fix logic described in huggingface#342 (review) 2. Set peft lora weight to trainable.
Hi @younesbelkada. I implement the try/except block. I think it is ready for review again. |
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this! I left one open question - I think we should address the change about is_trainable
in another PR - after that we should be good to merge
trl/models/modeling_base.py
Outdated
@@ -164,7 +163,9 @@ class and the arguments that are specific to trl models. The kwargs | |||
peft_config.base_model_name_or_path, *model_args, **pretrained_kwargs | |||
) | |||
|
|||
pretrained_model = PeftModel.from_pretrained(pretrained_model, pretrained_model_name_or_path) | |||
pretrained_model = PeftModel.from_pretrained( | |||
pretrained_model, pretrained_model_name_or_path, is_trainable=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is_trainable
is only relevant when the peft config is different from LoraConfig
no?
Also what if a user loads the model just for inference? I think we can leave this change in a future PR and add is_trainable
as a kwarg on from_pretained
of PreTrainedWrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my mistake. I think you're right. I have only considered my use case. 😅
I'll open another PR for the is_trainable
as a kwarg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks so much!
Leave is_trainable to future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Thanks a lot for fixing this bug 💪 This looks great to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix - looks good to me, maybe we can add a test so it doesn't happen again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Opdoop do you feel comfortable adding new tests?
The new test would be similar to this test: https://github.com/lvwerra/trl/blob/e0172fc8ecb11dfac0410feb1bf2a5c80ec9418b/tests/test_peft_models.py#L128 otherwise I can do it by pushing on your branch
I'm willing to try to add a test. But I didn't get it. Do you mean we pass a |
The workflow for that test would be: causal_lm_model = AutoModelForCausalLM.from_pretrained(self.causal_lm_model_id)
pretrained_model = get_peft_model(causal_lm_model, self.lora_config)
with ... as tmp_dir:
pretrained_model.save_pretrained(tmp_dir) and from there load the |
Thanks for the detailed explanation. I'll try it. 👍 |
Check that the model saved with peft class interface can be loaded properly.
❤Thank you for your careful and patient guidance, I learned a lot from it. |
Fix the bug in #341.
When the local model weight and the lora weight are saved in different locations, the code will try
hf_hub_download
and throwHFValidationError: Repo id must be in the form 'repo_name'
error twice. In the first time, the error was caught. But the second error will terminate the code.After loading the local peft model, we can update
pretrained_model_name_or_path
topretrained_model
to avoid entering the resume_training branch.