-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge_and_unload issue? #868
Comments
i meet the same issue. |
I think the issue is that the layers that are on "meta" device are not properly handled in this case. Their weights are not loaded and the lora weights are also just on meta device. The result you see probably only contains the weights that were actually loaded, missing all the meta device weights. Honestly, I'm not completely sure how best to handle this situation. Maybe @pacman100 or @younesbelkada have a good idea. |
@BenjaminBossan ,yes. |
Sorry, I'm really not sure what a solution would look like here. The reason why the weights are on meta device is that they needed to be offloaded for lack of memory, so it's not as simple as just loading everything. Hopefully one of the others can shine some light on this. |
This issue may occur when the GPU is occupied by other processes. device="auto" may not be able to load models into GPUs, so try to ensure that device is free or set device="cpu" instead. |
I've encountered this issue, the model was trained and merge with the same GPU(3090 24GB), VRAM is sufficient for lora training and inferencing, but after merged i got this issue |
Setting |
I think that merge and unload is not supported for models that are offloaded into disk / cpu as accelerate puts the offloaded weights on the meta device. This is a bug we should look into |
how to merge ptuning model? |
Hi @xiaobai52HZ I think that sadly currently merging p-tuning models is not supported. cc @pacman100 @BenjaminBossan |
Indeed. |
Can we add boolean arguments to infer device map to disable all off-loading? |
Offloading is not performed by PEFT. If you want to have this feature, please consider opening an issue in accelerate. |
This fixed the issue for me as well |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Closing this issue ! Feel free to re-open in case you think your concerns are not being addressed |
Still facing the problem: trainer.train()
# Save the adapter
trainer.save_model(saver_dir + '/adapter')
# Retrieve the model
model = trainer.model.base_model
# Loading the adapter
model = PeftModel.from_pretrained(model, saver_dir + '/adapter', torch_dtype=torch.float16, device_map="auto")
# Merge the base model and the adapter
model = model.merge_and_unload()
# Save the overall model
model.save_pretrained(saver_dir) It seems that the base model will be saved without the LoRA adapter. {
"_name_or_path": "meta-llama/Llama-2-7b-hf",
"architectures": [
"LlamaModel"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 2,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.34.0",
"use_cache": true,
"vocab_size": 32001
} |
This is expected @indiejoseph |
System Info
I am using the latest dev version of transformers, accelerate and peft (installed via !pip install -q -U git+) installed via Google Colab.
This worked a few days ago, but now when I attempt to merge an adapter back into the base model and then save to hub, the size is much smaller than that of the base model and it can't be loaded (generates error "Cannot copy out of meta tensor; no data!" when I attempt to do so).
The function I am using to merge the PeftModel back into the base model is:
(Code begins here)
The base model is 'meta-llama/Llama-2-7b-hf', which has two bin files, 1 @ 9.98GB and 1 @ 3.5GB. Previously when I ran this code, the merged model would be the same size. Now it only produces a single file @ 1.07GB.
This may be an error with the library, although I am not seeing any bug reports to indicate this. It may also be an error with the training code in the first place, my upload code, the HF library or anything else.
If anyone has any solutions, please let me know. Otherwise, if this is a bug, I guess this is my first bug report.
Who can help?
No response
Information
Tasks
examples
folderReproduction
This is the Colab I have been using. The data and LoRAs are private, but the two I am playing with are LLama 2 7B QLoRA fine-tuned on 1) a chat dataset and 2) a big hunk of raw text split into paragraphs (a piece of fanfiction).
https://colab.research.google.com/drive/15g2NU2wJ9fOvY3PJCCN5dVDYV8KSXbeS?usp=sharing
Expected behavior
The merged model should be created and be the same size as the base model, and I should be able to load it using AutoModelForCausalLM.from_pretrained as I was able to a few days ago.
The text was updated successfully, but these errors were encountered: