Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_and_unload issue? #868

Closed
2 of 4 tasks
Remmet577 opened this issue Aug 27, 2023 · 18 comments · Fixed by #1190
Closed
2 of 4 tasks

merge_and_unload issue? #868

Remmet577 opened this issue Aug 27, 2023 · 18 comments · Fixed by #1190
Assignees
Labels
bug Something isn't working

Comments

@Remmet577
Copy link

Remmet577 commented Aug 27, 2023

System Info

I am using the latest dev version of transformers, accelerate and peft (installed via !pip install -q -U git+) installed via Google Colab.

This worked a few days ago, but now when I attempt to merge an adapter back into the base model and then save to hub, the size is much smaller than that of the base model and it can't be loaded (generates error "Cannot copy out of meta tensor; no data!" when I attempt to do so).

The function I am using to merge the PeftModel back into the base model is:
(Code begins here)

def merge_adapter(lora_id, merged_id):
    config = PeftConfig.from_pretrained(lora_id)

    model_id = config.base_model_name_or_path
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=dtype,
        device_map="auto",
        offload_folder="offload"
        )

    adapter = PeftModel.from_pretrained(
        model,
        lora_id,
        torch_dtype=dtype,
        device_map="auto",
        offload_folder="offload"
    )

    model = adapter.merge_and_unload(progressbar=True)
    tokenizer = AutoTokenizer.from_pretrained(model_id) #, use_fast=False)

    model.save_pretrained(
        merged_id,
        push_to_hub=True,
        repo_id=merged_id,
        private=True,
        #max_shard_size="4GB"
    #,
    )

    tokenizer.save_pretrained(
        merged_id,
        push_to_hub=True,
        repo_id=merged_id,
        private=True,
    )

The base model is 'meta-llama/Llama-2-7b-hf', which has two bin files, 1 @ 9.98GB and 1 @ 3.5GB. Previously when I ran this code, the merged model would be the same size. Now it only produces a single file @ 1.07GB.

This may be an error with the library, although I am not seeing any bug reports to indicate this. It may also be an error with the training code in the first place, my upload code, the HF library or anything else.

If anyone has any solutions, please let me know. Otherwise, if this is a bug, I guess this is my first bug report.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

This is the Colab I have been using. The data and LoRAs are private, but the two I am playing with are LLama 2 7B QLoRA fine-tuned on 1) a chat dataset and 2) a big hunk of raw text split into paragraphs (a piece of fanfiction).

https://colab.research.google.com/drive/15g2NU2wJ9fOvY3PJCCN5dVDYV8KSXbeS?usp=sharing

Expected behavior

The merged model should be created and be the same size as the base model, and I should be able to load it using AutoModelForCausalLM.from_pretrained as I was able to a few days ago.

@zhurui-xiaozhuzaizai
Copy link

i meet the same issue.
when i merge lora to llama model, and the save it .the model size is smaller than before

@BenjaminBossan
Copy link
Member

I think the issue is that the layers that are on "meta" device are not properly handled in this case. Their weights are not loaded and the lora weights are also just on meta device. The result you see probably only contains the weights that were actually loaded, missing all the meta device weights.

Honestly, I'm not completely sure how best to handle this situation. Maybe @pacman100 or @younesbelkada have a good idea.

@zhurui-xiaozhuzaizai
Copy link

@BenjaminBossan ,yes.
when i checked. i find that some weight load on cpu , some on gpu . and the cpu weight is the meta device weights.
how should i change the meta device to cpu or GPU ?

@BenjaminBossan
Copy link
Member

Sorry, I'm really not sure what a solution would look like here. The reason why the weights are on meta device is that they needed to be offloaded for lack of memory, so it's not as simple as just loading everything. Hopefully one of the others can shine some light on this.

@xhwang22
Copy link

This issue may occur when the GPU is occupied by other processes. device="auto" may not be able to load models into GPUs, so try to ensure that device is free or set device="cpu" instead.

@indiejoseph
Copy link

I've encountered this issue, the model was trained and merge with the same GPU(3090 24GB), VRAM is sufficient for lora training and inferencing, but after merged i got this issue

@lukasld
Copy link

lukasld commented Sep 24, 2023

This issue may occur when the GPU is occupied by other processes. device="auto" may not be able to load models into GPUs, so try to ensure that device is free or set device="cpu" instead.

Setting device_map="cpu" from auto sovled the issue for me

@younesbelkada
Copy link
Contributor

I think that merge and unload is not supported for models that are offloaded into disk / cpu as accelerate puts the offloaded weights on the meta device. This is a bug we should look into

@xiaobai52HZ
Copy link

how to merge ptuning model?

@younesbelkada
Copy link
Contributor

Hi @xiaobai52HZ I think that sadly currently merging p-tuning models is not supported. cc @pacman100 @BenjaminBossan

@BenjaminBossan
Copy link
Member

Indeed.

@chiragjn
Copy link

chiragjn commented Oct 19, 2023

Can we add boolean arguments to infer device map to disable all off-loading?
I still want the benefits of using multiple GPUs but raise an error if the weights cannot fit with GPUs alone

@BenjaminBossan
Copy link
Member

Can we add boolean arguments to infer device map to disable all off-loading?

Offloading is not performed by PEFT. If you want to have this feature, please consider opening an issue in accelerate.

@krzysiekpodk
Copy link

This issue may occur when the GPU is occupied by other processes. device="auto" may not be able to load models into GPUs, so try to ensure that device is free or set device="cpu" instead.

This fixed the issue for me as well

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@younesbelkada
Copy link
Contributor

Closing this issue ! Feel free to re-open in case you think your concerns are not being addressed

@SuperBruceJia
Copy link

Still facing the problem:

trainer.train()

# Save the adapter
trainer.save_model(saver_dir + '/adapter')

# Retrieve the model
model = trainer.model.base_model

# Loading the adapter
model = PeftModel.from_pretrained(model, saver_dir + '/adapter', torch_dtype=torch.float16, device_map="auto")
  
# Merge the base model and the adapter
model = model.merge_and_unload()

# Save the overall model
model.save_pretrained(saver_dir)

It seems that the base model will be saved without the LoRA adapter.

{
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaModel"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pad_token_id": 2,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.34.0",
  "use_cache": true,
  "vocab_size": 32001
}

@younesbelkada
Copy link
Contributor

This is expected @indiejoseph
If you call merge_and_unload() it will merge lora adapters into the base model, unload the lora layers and return the base transformers model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.