Negative weights on add_weighted_adapter #1907

freddy5566 · 2024-07-05T05:07:19Z

System Info

python=3.8
peft=0.11.1

Who can help?

@BenjaminBossan @sayakpaul

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I would like to perform task vector on LoRA adapters in the following fashion: task = pre_trained + LoRA_1 - LoRA_2
here is the following code that I used

#model
base_model = LlamaForCausalLM.from_pretrained(
    pre_trained_model_name_or_path,
    load_in_8bit=False,
    torch_dtype=torch.float16,
    device_map="auto",
)

first_model_name = model_name_or_pathes[0].split("/")[-1]
self.model = PeftModel.from_pretrained(base_model, model_name_or_pathes[0], first_model_name)

names = [first_model_name]
for lora_path in model_name_or_pathes[1:]:
    name = lora_path.split("/")[-1]
    names.append(name)

    self.model.load_adapter(lora_path, adapter_name=name)

adapter_name = "-".join(names)
self.model.add_weighted_adapter(
    adapters=names,
    weights=[1, -1],
    adapter_name=adapter_name,
    combination_type=combine_method,
    density=density,
)
self.model.set_adapter(adapter_name)       
self.model.eval()

but, I got this error message: ValueError: math domain error

I believe it is caused by

peft/src/peft/tuners/lora/model.py

Lines 791 to 794 in 09358aa

    
               valid_weights.append(math.sqrt(weight * target.scaling[adapter])) 
        
               lora_A_deltas.append(current_adapter_lora_A.data) 
        
               lora_B_deltas.append(current_adapter_lora_B.data) 
        
           valid_weights = torch.tensor(valid_weights).to(lora_A_deltas[0].device)

Expected behavior

After reading the code, I have the following questions,

Why should we apply the scaling on the weight?
Why should we calculate the square root on the scaled weights?
I also noticed that, in this case, the behavior would be a little bit different than using merge_and_unload when there is only one LoRA adapter. It seems like merge_and_unload does not multiply math.sqrt(1* target.scaling[adapter]). It only multiplies the scaling: alpha / rank
The merged weight = original weight + BA * scaling

peft/src/peft/tuners/lora/layer.py

Line 455 in 09358aa

base_layer.weight.data = base_layer.weight.data + delta_weight
What is the correct way to perform task forgetting under this setting?

Thanks!

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-07-05T09:46:24Z

For more context on why the weights are scaled like this, please check this discussion: #1155. This should address questions 1-3.

4. What is the correct way to perform task forgetting under this setting?

We have not experimented with task forgetting and I'm not sure if merging with a negative weight is a possible solution. If you could explain further, what the base model is, what you want it to forget, and what the learned adapters were trained on, I might be able to make some suggestions.

freddy5566 · 2024-07-06T17:16:32Z

Hello @BenjaminBossan,

Thank you for your quick response. I think I understood the motivation of the current implementation.

However, regarding question 4, sorry, I am afraid that I cannot disclose any details of it. But, the idea behind it is, say, for example, I pre-trained LLaMA w/ LoRA on a multi-task (task a and task b) dataset. I also have a single-task dataset (task a), and now I wanna perform task forgetting via this: task_b = pre_trained + LoRA_1 (task a and b) - LoRA_2 (task a)

I don't know if there are any suggestions for this functionality.

Thanks!

BenjaminBossan · 2024-07-08T09:13:07Z

Thanks for explaining the general idea. If your two LoRA adapters target the same layers and have the same rank, what you could try is to load the state dict of LoRA_1 and subtract the state dict of LoRA_2 manually, then load the new LoRA weights onto the model. This should probably have the effect that you wanted to achieve with the negative weights.
Whether this actually leads to forgetting task a but leaving b intact, I'm not sure. I would actually not think it will work, as this assumes that the two tasks are completely orthogonal.

freddy5566 · 2024-07-09T07:41:51Z

thanks! it turns out it works to some degree... But, it might not be an ideal way to perform such a task.
anyways, thanks for your quick instruction.

BenjaminBossan · 2024-07-09T08:40:20Z

Great, then I'll wish you luck with your further experiments. If you figure out a way to make forgetting tasks learned by LoRA work well, feel free to share it, maybe we can integrate that in PEFT.

For now, I'll close the issue, please re-open if something new comes up.

BenjaminBossan closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative weights on add_weighted_adapter #1907

Negative weights on add_weighted_adapter #1907

freddy5566 commented Jul 5, 2024 •

edited

Loading

BenjaminBossan commented Jul 5, 2024

freddy5566 commented Jul 6, 2024

BenjaminBossan commented Jul 8, 2024 •

edited

Loading

freddy5566 commented Jul 9, 2024

BenjaminBossan commented Jul 9, 2024

Negative weights on add_weighted_adapter #1907

Negative weights on add_weighted_adapter #1907

Comments

freddy5566 commented Jul 5, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Jul 5, 2024

freddy5566 commented Jul 6, 2024

BenjaminBossan commented Jul 8, 2024 • edited Loading

freddy5566 commented Jul 9, 2024

BenjaminBossan commented Jul 9, 2024

freddy5566 commented Jul 5, 2024 •

edited

Loading

BenjaminBossan commented Jul 8, 2024 •

edited

Loading