-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model forgets finetuning after saving / loading #503
Comments
I tried one more thing. The "forgetting" also happens if after finetuning I save the model with
It's weird. |
Hi @ingo-m, I just had a similar issue while saving and loading a PEFTModel. Here's a short snipped on how you can load a checkpoint if a PEFTModel after training: model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
trust_remote_code=True
)
model = prepare_model_for_kbit_training(model)
config = LoraConfig(
r=64,
lora_alpha=16,
target_modules=[
"query_key_value",
"dense",
"dense_h_to_4h",
"dense_4h_to_h",
],
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
model.resize_token_embeddings(len(tokenizer)) # Optional
model.load_state_dict(torch.load("..../checkpoint-300/pytorch_model.bin")) |
@MatthiasEg thanks. So checkpoints are not affected by this "forgetting" issue, and can be used as a workaround? |
Checkpoints or saving the model - there's no difference.
From the small snipped you provided, you are not creating a PEFT model (from the base model) before loading all the weights, meaning all PEFT layers (and possible other adjustments to the model) are missing. Hence, it's impossible to reload the previous state the model had before saving. |
Not sure whether I understand this. After finetuning, I load the model like this:
See https://colab.research.google.com/drive/1mGpLQk8VMFfh_jcMGPaTfygGOdqlUUTs?usp=sharing My understanding is that the |
Honestly, I don't know what the correct approach to safe and load a PEFT fine-tuned model is (maybe some could elaborate on that). But I know that with the snippet I provided above, I can successfully load my PEFT fine-tuned model from the disk, without losing the knowledge it gained during training. Did you try using prepare_model_for_kbit_training and get_peft_model from the peft library, as well as loading the state dict afterwards with load_state_dict? This approach at least worked for me. |
@MatthiasEg thanks, I can confirm that saving & loading with the torch
Notebook with complete example (works as expected): https://colab.research.google.com/drive/1JPevLSsOq6DWr2tKw7reEDcgwmYhbEfp?usp=sharing I still don't understand why my original example, using the PEFT methods
Notebook with complete example (doesn't work as expected, forgets finetuning): https://colab.research.google.com/drive/1mGpLQk8VMFfh_jcMGPaTfygGOdqlUUTs?usp=sharing The conclusion for me is to avoid the PEFT methods |
Actually let me re-open this in the hope that someone can shed some light on it. Using the |
Hi @ingo-m Line 396 in 016722a
Would you be able to share a small reproducible snippet that is quick to run? 🙏 |
cc @pacman100 also just FYI |
@younesbelkada thanks for looking into this. This is a colab notebook with a standalone example (where the model forgets the finetuning), it should run as is on a free colab instance: https://colab.research.google.com/drive/1mGpLQk8VMFfh_jcMGPaTfygGOdqlUUTs?usp=sharing It's quite a few lines of code but that's just boilerplate (required for saving & loading model from disk). When you open the notebook you should be able to see the output from when I ran it. |
I coincidentally discovered why my original example didn't work as expected (model "forgets" finetuning after saving & loading). In my example I used this
Full example notebook: https://colab.research.google.com/drive/1mGpLQk8VMFfh_jcMGPaTfygGOdqlUUTs?usp=sharing The problematic part is I'm not sure whether this can be considered a bug. On the one hand, |
Hi @ingo-m |
I wonder if this issue #602 is related. The workarounds found in the linked issue was disabling |
@sjrl Interesting, looks like it's the same problem. In the original example I had |
@BenjaminBossan Yes I can confirm that the issue is fixed when installing PEFT from the current main branch ( Complete example: https://colab.research.google.com/drive/1XMe3iVaSBPjgo72RU9zPaQaZaANNWHHf?usp=sharing |
+1. This was such an annoying issue for way too long. but i can confirm that this works now. ADAPTER_DIR = "./peftmodel_save_pretrained"
model = get_peft_model(AutoModelForX.from_pretrained({...}), LoraConfig({...}))
model.train()
model.save_pretrained(ADAPTER_DIR) PeftModel.from_pretrained(AutoModelForX.from_pretrained({...}), ADAPTER_DIR, is_trainable=True)
model.train() |
After finetuning an
AutoModelForCausalLM
model with PEFT, the model forgets what it learned after saving / loading.I created a minimal example here: https://colab.research.google.com/drive/1mGpLQk8VMFfh_jcMGPaTfygGOdqlUUTs?usp=sharing
The colab notebook can run on a (free) T4 instance.
In the minimal example, I'm using PEFT on a
bigscience/bloom-560m
base model. As a toy example, during finetuning, the model learns the alphabet in reverse order.Before training and after each epoch, I let the model generate a prediction from this prompt:
"z y x w v u t s r"
After one epoch, the model has already memorized the target sequence (the alphabet in reverse order).
Then I save the model to gdrive, restart the runtime, and load the model. The model has forgotten what it learned during finetuning:
I observe this “forgetting” locally and on google colab. I also tried pushing the PEFT model to huggingface instead of saving locally, it also results in forgetting.
Moreover, it doesn’t matter which model dtype I use (
torch.bfloat16
,torch.float16
,torch.float32
), or whetherload_in_8bit
isTrue
orFalse
.In the minimal example linked above, I’m using a custom torch training loop, but I get the same result with
transformers.Trainer
.Am I overlooking something obvious? How is this possible?
The text was updated successfully, but these errors were encountered: