Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative weights on add_weighted_adapter #1907

Closed
2 of 4 tasks
freddy5566 opened this issue Jul 5, 2024 · 5 comments
Closed
2 of 4 tasks

Negative weights on add_weighted_adapter #1907

freddy5566 opened this issue Jul 5, 2024 · 5 comments

Comments

@freddy5566
Copy link

freddy5566 commented Jul 5, 2024

System Info

python=3.8
peft=0.11.1

Who can help?

@BenjaminBossan @sayakpaul

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I would like to perform task vector on LoRA adapters in the following fashion: task = pre_trained + LoRA_1 - LoRA_2
here is the following code that I used

#model
base_model = LlamaForCausalLM.from_pretrained(
    pre_trained_model_name_or_path,
    load_in_8bit=False,
    torch_dtype=torch.float16,
    device_map="auto",
)

first_model_name = model_name_or_pathes[0].split("/")[-1]
self.model = PeftModel.from_pretrained(base_model, model_name_or_pathes[0], first_model_name)

names = [first_model_name]
for lora_path in model_name_or_pathes[1:]:
    name = lora_path.split("/")[-1]
    names.append(name)

    self.model.load_adapter(lora_path, adapter_name=name)

adapter_name = "-".join(names)
self.model.add_weighted_adapter(
    adapters=names,
    weights=[1, -1],
    adapter_name=adapter_name,
    combination_type=combine_method,
    density=density,
)
self.model.set_adapter(adapter_name)       
self.model.eval()

but, I got this error message: ValueError: math domain error

I believe it is caused by

valid_weights.append(math.sqrt(weight * target.scaling[adapter]))
lora_A_deltas.append(current_adapter_lora_A.data)
lora_B_deltas.append(current_adapter_lora_B.data)
valid_weights = torch.tensor(valid_weights).to(lora_A_deltas[0].device)

Expected behavior

After reading the code, I have the following questions,

  1. Why should we apply the scaling on the weight?
  2. Why should we calculate the square root on the scaled weights?
  3. I also noticed that, in this case, the behavior would be a little bit different than using merge_and_unload when there is only one LoRA adapter. It seems like merge_and_unload does not multiply math.sqrt(1* target.scaling[adapter]). It only multiplies the scaling: alpha / rank
    The merged weight = original weight + BA * scaling
    base_layer.weight.data = base_layer.weight.data + delta_weight
  4. What is the correct way to perform task forgetting under this setting?

Thanks!

@BenjaminBossan
Copy link
Member

For more context on why the weights are scaled like this, please check this discussion: #1155. This should address questions 1-3.

4. What is the correct way to perform task forgetting under this setting?

We have not experimented with task forgetting and I'm not sure if merging with a negative weight is a possible solution. If you could explain further, what the base model is, what you want it to forget, and what the learned adapters were trained on, I might be able to make some suggestions.

@freddy5566
Copy link
Author

Hello @BenjaminBossan,

Thank you for your quick response. I think I understood the motivation of the current implementation.

However, regarding question 4, sorry, I am afraid that I cannot disclose any details of it. But, the idea behind it is, say, for example, I pre-trained LLaMA w/ LoRA on a multi-task (task a and task b) dataset. I also have a single-task dataset (task a), and now I wanna perform task forgetting via this: task_b = pre_trained + LoRA_1 (task a and b) - LoRA_2 (task a)

I don't know if there are any suggestions for this functionality.

Thanks!

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Jul 8, 2024

Thanks for explaining the general idea. If your two LoRA adapters target the same layers and have the same rank, what you could try is to load the state dict of LoRA_1 and subtract the state dict of LoRA_2 manually, then load the new LoRA weights onto the model. This should probably have the effect that you wanted to achieve with the negative weights.
Whether this actually leads to forgetting task a but leaving b intact, I'm not sure. I would actually not think it will work, as this assumes that the two tasks are completely orthogonal.

@freddy5566
Copy link
Author

thanks! it turns out it works to some degree... But, it might not be an ideal way to perform such a task.
anyways, thanks for your quick instruction.

@BenjaminBossan
Copy link
Member

Great, then I'll wish you luck with your further experiments. If you figure out a way to make forgetting tasks learned by LoRA work well, feel free to share it, maybe we can integrate that in PEFT.

For now, I'll close the issue, please re-open if something new comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants