Update OFT to fix merge bugs #1996

Zeju1997 · 2024-08-08T18:13:06Z

changes for BOFT

fixing small errors in comments of BOFT
making Conv2D operation consistent with Linear

changes for OFT

[Important!] fixing the incorrect merging of OFT, it is not an implementation error, but more of an incorrect understanding of how orthogonal fine-tuning works, resulting in a wrong way of merging adapters
update config.py/layer.py/model.py file to be consistent with other peft methods
adding additional paramters oft_block_size, oft_dropout, bias
fixing module dropout implementation error (in orthogonal finetuning, dropout is not the same as LoRA)
fixing errors in merge and unmerge function in OFT
making Conv2D operation consistent with Linear

BenjaminBossan

Thanks a lot for fixing the issues you mentioned with OFT. I don't understand all the details but in general, this is to bring OFT closer to the BOFT implementation, is that right? Is that also why you decided to no longer use LycorisTuner as a base?

Did you check whether after these changes, the same code using OFT would still produce the exact same results? E.g. let's say a user has trained an OFT model checkpoint in PEFT 0.12. Now we merge this PR and release it in PEFT 0.13. If the user upgrades to 0.13 and loads the OFT checkpoint, would they still get the same results? This is really important because otherwise it would be a source of bugs.

If we cannot ensure the same results after this PR because, for instance, something was not implemented correctly previously, at the very least we have to add some mechanism to error or warn users when they try to load an old checkpoint with new PEFT versions.

I have added some smaller comments too, please check them out. On top of those, please run make style and ensure 120 character line limits (e.g. long strings).

Also, pinging @okotaku and @lukaskuhn-lku since they worked on the original OFT PR.

BenjaminBossan · 2024-08-09T10:25:00Z

src/peft/tuners/boft/layer.py

+ elif boft_block_size != 0 and boft_block_num != 0:
+ raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")
+
 else:
- raise ValueError(
- f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
- "to be 0, because boft_block_size x boft_block_num != in_features."
- )
+ raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")


Do we need these checks? We already have checks here:

peft/src/peft/tuners/boft/config.py

Lines 127 to 133 in 41c274e

if self.boft_block_size == 0 and self.boft_block_num == 0:

raise ValueError("You must specify either boft_block_size or boft_block_num.")

if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):

raise ValueError(

f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "

"but not both simultaneously, because boft_block_size x boft_block_num != in_features."

)

Yes, you are right, I forgot there is already a check in the config.py Updated.

BenjaminBossan · 2024-08-09T10:25:17Z

src/peft/tuners/boft/layer.py

+ elif boft_block_size != 0 and boft_block_num != 0:
+ raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")
+
 else:
- raise ValueError("Unknown error!")
+ raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")


Same comment.

BenjaminBossan · 2024-08-09T10:27:24Z

src/peft/tuners/oft/layer.py

-class OFTLayer(nn.Module, LycorisLayer):
+class MultiplicativeDropoutLayer(nn.Module):
+ """
+ Implements the multiplicative dropout layer for BOFT.


Suggested change

Implements the multiplicative dropout layer for BOFT.

Implements the multiplicative dropout layer for OFT.

BenjaminBossan · 2024-08-09T10:28:33Z

src/peft/tuners/oft/layer.py

+ if D == 1:
+ return x
+
+ # Create a mask with 1s for matrices to be replaced with identity and 0s otherwise


Sentence doesn't quite make sense, could you please check again?

src/peft/tuners/oft/layer.py

Zeju1997 · 2024-08-09T13:21:47Z

@BenjaminBossan Hi, thank you so much. Because the main change (beside changing the format) is in the merge/unmerge, it should be possible to update it without affecting previous trained model checkpoints. I am currently testing it. But I am currently facing a problem:

import transformers
import accelerate
import peft
from datasets import load_dataset
from transformers import AutoImageProcessor
from peft import LoraConfig, BOFTConfig, OFTConfig, get_peft_model
import numpy as np
import evaluate
import torch
import os

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from peft import PeftConfig, PeftModel, AutoPeftModel

os.environ["HF_API_TOKEN"] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

if __name__ == '__main__':
    dataset = load_dataset("food101", split="train[:5000]")

    output_dir = "output"

    model_checkpoint = "google/vit-base-patch16-224-in21k"

    labels = dataset.features["label"].names
    label2id, id2label = dict(), dict()
    for i, label in enumerate(labels):
        label2id[label] = i
        id2label[i] = label

    image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)


    normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
    train_transforms = Compose(
        [
            RandomResizedCrop(image_processor.size["height"]),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

    val_transforms = Compose(
        [
            Resize(image_processor.size["height"]),
            CenterCrop(image_processor.size["height"]),
            ToTensor(),
            normalize,
        ]
    )


    def preprocess_train(example_batch):
        """Apply train_transforms across a batch."""
        example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    def preprocess_val(example_batch):
        """Apply val_transforms across a batch."""
        example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    splits = dataset.train_test_split(test_size=0.1)
    train_ds = splits["train"]
    val_ds = splits["test"]

    train_ds.set_transform(preprocess_train)
    val_ds.set_transform(preprocess_val)


    def print_trainable_parameters(model):
        trainable_params = 0
        all_param = 0
        for _, param in model.named_parameters():
            all_param += param.numel()
            if param.requires_grad:
                trainable_params += param.numel()
        print(
            f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
        )

    model = AutoModelForImageClassification.from_pretrained(
        model_checkpoint,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )


    print_trainable_parameters(model)

    config = LoraConfig(
        r=32,
        lora_alpha=16,
        target_modules=["query", "value"],
        lora_dropout=0.1,
        bias="none",
        # modules_to_save=["classifier"],
    )

    lora_model = get_peft_model(model, config)
    print_trainable_parameters(lora_model)

    model_name = model_checkpoint.split("/")[-1]
    batch_size = 128

    args = TrainingArguments(
        f"{model_name}-finetuned-lora-food101",
        remove_unused_columns=False,
        evaluation_strategy="epoch",
        save_strategy="no",
        learning_rate=5e-3,
        per_device_train_batch_size=batch_size,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=batch_size,
        fp16=True,
        num_train_epochs=5,
        logging_steps=10,
        metric_for_best_model="accuracy",
        label_names=["labels"],
        eval_strategy="steps",
        eval_steps=100000000000,
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        """Computes accuracy on a batch of predictions"""
        predictions = np.argmax(eval_pred.predictions, axis=1)
        return metric.compute(predictions=predictions, references=eval_pred.label_ids)

    def collate_fn(examples):
        pixel_values = torch.stack([example["pixel_values"] for example in examples])
        labels = torch.tensor([example["label"] for example in examples])
        return {"pixel_values": pixel_values, "labels": labels}

    trainer = Trainer(
        lora_model,
        args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        tokenizer=image_processor,
        compute_metrics=compute_metrics,
        data_collator=collate_fn,
    )
    eval_results = trainer.evaluate(val_ds)
    print('Before training', eval_results)

    repo_name = f"sgp-bench/peft-finetuned-lora-food101"

    train_results = trainer.train()
    
    token = os.getenv("HF_API_TOKEN")
    lora_model.push_to_hub(repo_name, token=token)

    eval_results = trainer.evaluate(val_ds)
    print('After training', eval_results)

    config = PeftConfig.from_pretrained(repo_name)
    model = AutoModelForImageClassification.from_pretrained(
        config.base_model_name_or_path,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )
    # Load the LoRA model
    # inference_model = AutoPeftModel.from_pretrained(repo_name).to("cuda")
    trainer.model = PeftModel.from_pretrained(model, repo_name).to("cuda")
    eval_results = trainer.evaluate(val_ds)
    print('Load adapter weight and inference', eval_results)

I am currently debugging with LoRA and the official guide from doc.
I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?
If I do merge_and_unload(), the eval_accuracy is the same as after training.
Is there something wrong with the way I am loading adapter weights?

…sion checkpoints / loading checkpoint successful

BenjaminBossan · 2024-08-09T15:22:32Z

I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?

I could reproduce your issue of low accuracy (I did not train, just loaded your checkpoint). I think the issue is this line, right?

# modules_to_save=["classifier"]

For this reason, the classifier layer is randomly initialized, so the accuracy is low.

If I do merge_and_unload(), the eval_accuracy is the same as after training.

Could you show me which steps you take to achieve this?

Zeju1997 · 2024-08-09T15:39:25Z

Ah, ok, you are right, in the old checkpoint with merge_and_unload I did have the classifier saved.

It should be working now, I have tested it can load an old checkpoint using previous version of PEFT OFT.

HuggingFaceDocBuilderDev · 2024-08-13T09:10:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1	branch 2	method	status
main	PR	OFT	fail: outputs different
PR	main	OFT	`TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'`
main	PR	BOFT	success
PR	main	BOFT	success

In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

BenjaminBossan · 2024-08-13T09:08:42Z

src/peft/tuners/boft/layer.py

- f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
- "to be 0, because boft_block_size x boft_block_num != in_features."
- )
+ raise ValueError("Unknown error!")


So this code should not be reachable. Still, let's use a more helpful error message. My proposal:

"Something went wrong, please report this error: https://github.com/huggingface/peft/issues".

src/peft/tuners/boft/layer.py

BenjaminBossan · 2024-08-13T09:43:54Z

src/peft/tuners/oft/layer.py

 # OFT info
 self.oft_r = nn.ParameterDict({})
+ self.oft_s = nn.ParameterDict({})


Here we introduce a new parameter oft_s. All existing checkpoints don't have this parameter, so that means when a user loads an old checkpoint, this parameter is randomly initialized, which means the checkpoint behaves different. See my longer comment on compatibility.

Actually, it depends on the init_weights, if init_weights=True (which should be the default behavior), then oft_s is basically like an identity and should not make changes. If init_weights=False, then it is randomly initialized.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

BenjaminBossan · 2024-08-13T09:45:37Z

src/peft/tuners/oft/layer.py

+ elif r != 0 and oft_block_size != 0:
+ raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
+ else:
+ raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")


These lines should not be reachable because of the checks we have in the config class, but IMO it's fine to leave them here.

BenjaminBossan · 2024-08-13T09:47:26Z

src/peft/tuners/oft/layer.py

+ elif r != 0 and oft_block_size != 0:
+ raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
+ else:
+ raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")


Same comment as earlier: Should be unreachable but it's fine if we leave the code.

BenjaminBossan · 2024-08-13T09:48:16Z

tests/test_custom_models.py

- ("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"]}),
- ("Vanilla MLP 5 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"], "modules_to_save": ["lin1"]}),
+ ("Vanilla MLP 1 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": "lin0"}),
+ ("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),


Suggested change

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

Zeju1997 · 2024-08-13T10:58:36Z

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:
# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)
Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success
In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

BenjaminBossan

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Good questions. I checked what would be the best place to change this, keeping in mind that in the future, we could have more of these cases.

My suggestion is the following. In this line, we load all the arguments for the config:

peft/src/peft/config.py

Line 151 in 0421234

kwargs = {**class_kwargs, **loaded_attributes}

Here, we could add a check on these arguments that could be used to raise an error if an old OFT config is detected. So the new code would be:

        loaded_attributes = cls.from_json_file(config_file)
        kwargs = {**class_kwargs, **loaded_attributes}
        kwargs = self.check_kwargs(**kwargs)  # <= new line
        return cls.from_peft_type(**kwargs)

By default, this method should be defined on PeftConfigMixin to just return the kwargs without any validation or change. Then on OFTConfig, we can overrride this method to check if it's from an old OFT checkpoint and raise an error.

To check if the config is from an old checkpoint, we could simply check for the presence of one of the new arguments added in this PR. WDYT?

Maybe long term it would be better to also store the PEFT version in the adapter_config.json so that this can be checked instead. But it's not quite easy, as existing checkpoints don't have this info yet. So let's not add that in the current PR.

BenjaminBossan · 2024-08-13T13:24:01Z

src/peft/tuners/oft/layer.py

 # OFT info
 self.oft_r = nn.ParameterDict({})
+ self.oft_s = nn.ParameterDict({})


True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

Zeju1997 · 2024-08-26T14:27:41Z

@BenjaminBossan Sorry for the late reply, I added the check_kwargs, do you think it is ok like that? As for the oft_s, if the user used an older version, it would be not existent, therefore, even if it is trained, it will still be an identity transform, therefore should make no difference.

BenjaminBossan

Thanks for making these latest changes.

Concerning the compatibility check, I have a few comments, please check.

Also, I think it would be good to have a unit test for this check. I created a checkpoint with the old OFT implementation here: https://huggingface.co/peft-internal-testing/OFT-tiny-random-OPTForCausalLM. You can use it to load and then check that the warning has been raised.

src/peft/config.py

BenjaminBossan · 2024-08-26T15:31:38Z

src/peft/config.py

+ """
+ if "oft_block_size" in kwargs:
+ warnings.warn(
+ 'OFT has been updated since 0.12.1.dev0. Your trained adapter weights may not be compatible with the latest version of OFT. Please retrain your adapter weights.')


Note that dev0 is not a real version, so let's not refer to that. The next release version will be 0.13.0, so let' use that version.

"may not be compatible": We are pretty sure it is incompatible when trained, right? Let's phrase it as "is incompatible".

Let's also mention that users can downgrade PEFT to version 0.12.0 and then the adapter will still work.

BenjaminBossan · 2024-08-26T15:34:13Z

src/peft/config.py

+ kwargs (additional keyword arguments, *optional*):
+ Additional keyword arguments passed along to the child class initialization.
+ """
+ if "oft_block_size" in kwargs:


Hmm, so oft_block_size is the new parameter, right? So if we're loading an old OFT model, it should be missing. Therefore, should the check not be if "oft_block_size" not in kwargs?

Zeju1997 · 2024-08-27T16:34:02Z

@BenjaminBossan I noticed some other problems in the previous implementation of OFT, the size of the OFT adapters are also defined incorrectly (OFT should rotate the neurons, not the feature dimensions, the current way could have some fine-tuning effects, but is not optimal). Therefore, I added the raise ValueError in the OFTConfig, because it is now not possible to load the old adapter weights. I would suggest the user either use the old version of PEFT or retrain the adapter weights entirely. Best.

BenjaminBossan

Thanks for making further corrections to OFT. I have a few smaller comments and also a question about the parameter finding algorithm, please take a look.

Also, please ensure to run make style on your code changes.

BenjaminBossan · 2024-08-28T08:50:01Z

src/peft/tuners/oft/config.py

+ """
+ if "oft_block_size" not in kwargs:
+ raise ValueError(
+ 'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'


Suggested change

'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

Also, let's ensure the 120 char line limit.

BenjaminBossan · 2024-08-28T08:50:30Z

src/peft/tuners/oft/config.py

+ if "oft_block_size" not in kwargs:
+ raise ValueError(
+ 'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
+ 'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'


Suggested change

'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'

'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

Not specific to merging, right?

BenjaminBossan · 2024-08-28T08:52:37Z

src/peft/tuners/oft/layer.py

+ warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
+ oft_block_size = self.adjust_oft_parameters(self.in_features, oft_block_size)
+ warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")


Instead of two warnings, let's make this one warning by prepending the first message to the second.

BenjaminBossan · 2024-08-28T08:52:48Z

src/peft/tuners/oft/layer.py

+ if self.in_features % r != 0 or r > self.in_features:
+ warnings.warn(f"Invalid `r` ({r})!")
+ r = self.adjust_oft_parameters(self.in_features, r)
+ warnings.warn(f"Adjusted `r` to ({r}).")


Same about having a single warning.

src/peft/tuners/oft/layer.py

BenjaminBossan · 2024-08-28T09:11:32Z

src/peft/tuners/oft/layer.py

+ if conv_filter_dim % oft_block_size != 0 or oft_block_size > conv_filter_dim:
+ warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
+ oft_block_size = self.adjust_oft_parameters(conv_filter_dim, oft_block_size)
+ warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")


Same comment about having a single warning.

BenjaminBossan · 2024-08-28T09:11:38Z

src/peft/tuners/oft/layer.py

+ if conv_filter_dim % r != 0 or r > conv_filter_dim:
+ warnings.warn(f"Invalid `r` ({r})!")
+ r = self.adjust_oft_parameters(conv_filter_dim, r)
+ warnings.warn(f"Adjusted `r` to ({r}).")


Same comment about having a single warning.

Zeju1997 · 2024-09-26T14:53:42Z

@BenjaminBossan Thanks, I have updated the mixed peft test.

BenjaminBossan

Excellent, thanks. Just a small issue with one of the tests, the rest looks good.

BenjaminBossan · 2024-09-26T15:05:46Z

tests/test_mixed.py

- torch.manual_seed(4)
- config4 = OFTConfig(r=2, task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"], init_weights=False)
- peft_model.add_adapter("adapter4", config4)
- peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3", "adapter4"])


Let's keep this line in, just without "adapter4", i.e.: peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3"]).

Sure, updated.

@BenjaminBossan Could you check again, please? Best.

BenjaminBossan · 2024-09-26T15:48:48Z

tests/test_mixed.py

+ torch.manual_seed(4)
+ config4 = OFTConfig(r=2, task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"], init_weights=False)
+ peft_model.add_adapter("adapter4", config4)


These lines should be removed. We only want to keep the peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3"]) part. Also don't forget to remove the OFTConfig import, as it should not be needed anymore.

Zeju1997 · 2024-09-26T15:54:15Z

@BenjaminBossan Sure, updated.

BenjaminBossan

I took a look at the failing tests. We need to change a few things to make them pass:

The config tests actually don't make a lot of sense as they are. For now, simply remove OFT here.
We need to ensure for a few tests that OFT weights are not initialized as identity transform, same as we already do with BOFT. So each time in test_decoder_models.py, test_encoder_decoder_models.py, and test_feature_extraction.py, when there is the line "boft_kwargs": {"init_weights": [False]}, also add "oft_kwargs": {"init_weights": [False]}.
OFT no longer supports Conv1D layers. Either this support needs to be added back in or GPT2 tests need to be skipped, same as for BOFT (see here).
This line needs to add OFT.
Also add OFT here, same as BOFT.

I think that covers all, but maybe I missed something. The complete diff I got locally is:

modified   tests/test_config.py
@@ -34,7 +34,6 @@ from peft import (
     LoKrConfig,
     LoraConfig,
     MultitaskPromptTuningConfig,
-    OFTConfig,
     PeftConfig,
     PeftType,
     PolyConfig,
@@ -61,7 +60,6 @@ ALL_CONFIG_CLASSES = (
     LoKrConfig,
     LoraConfig,
     MultitaskPromptTuningConfig,
-    OFTConfig,
     PolyConfig,
     PrefixTuningConfig,
     PromptEncoderConfig,
modified   tests/test_decoder_models.py
@@ -22,6 +22,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import (
     AdaLoraConfig,
     BOFTConfig,
+    OFTConfig,
     HRAConfig,
     LoraConfig,
     PrefixTuningConfig,
@@ -59,7 +60,7 @@ def skip_boft_or_hra_and_gpt2(test_list):
     return [
         test
         for test in test_list
-        if not (("GPT2LMHeadModel" in test[1]) and ((test[2] == BOFTConfig) or (test[2] == HRAConfig)))
+        if not (("GPT2LMHeadModel" in test[1]) and ((test[2] == BOFTConfig) or (test[2] == OFTConfig) or (test[2] == HRAConfig)))
     ]
 
 
@@ -205,6 +206,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -222,6 +224,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "lora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -336,6 +339,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -373,6 +377,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
modified   tests/test_encoder_decoder_models.py
@@ -173,6 +173,7 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "SEQ_2_SEQ_LM",
@@ -207,6 +208,7 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "SEQ_2_SEQ_LM",
modified   tests/test_feature_extraction_models.py
@@ -111,6 +111,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
@@ -164,6 +165,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
@@ -180,6 +182,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "lora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
             },
modified   tests/testing_common.py
@@ -652,7 +652,7 @@ class PeftCommonTester:
         if issubclass(config_cls, PromptLearningConfig):
             return pytest.skip(f"Test not applicable for {config_cls}")
 
-        if issubclass(config_cls, BOFTConfig):
+        if issubclass(config_cls, (OFTConfig, BOFTConfig)):
             return pytest.skip(f"Test not applicable for {config_cls}")
 
         if ("gpt2" in model_id.lower()) and (config_cls != LoraConfig):
@@ -1290,7 +1290,7 @@ class PeftCommonTester:
         model = get_peft_model(model, config)
         model = model.to(self.torch_device)
 
-        if config.peft_type not in ("LORA", "ADALORA", "IA3", "BOFT", "VERA", "FOURIERFT", "HRA", "VBLORA"):
+        if config.peft_type not in ("LORA", "ADALORA", "IA3", "BOFT", "OFT", "VERA", "FOURIERFT", "HRA", "VBLORA"):
             with pytest.raises(AttributeError):
                 model = model.unload()
         else:

There are a few more failing that are unrelated to this PR. We need #2106 merged first before these tests can pass.

BenjaminBossan · 2024-09-27T10:22:59Z

src/peft/tuners/oft/config.py

+ """
+ if "oft_block_size" not in kwargs:
+ raise ValueError(
+ "OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."


Let's break this long line to be 120 chars at most. Also, there is a space missing at the end of the sentence.

Zeju1997 · 2024-09-27T11:10:18Z

@BenjaminBossan Thank you so much for the detailed suggestions. I have updated the test files. Best.

BenjaminBossan · 2024-09-27T11:21:20Z

Thanks for the quick action. You missed a few changes still in test_decoder_models.py, test_encoder_decoder_models.py, and this line is missing a space at the end. After this, we have yet to wait for #2106 to be merged first.

Zeju1997 · 2024-09-27T11:47:21Z

Thanks for the quick action. You missed a few changes still in test_decoder_models.py, test_encoder_decoder_models.py, and this line is missing a space at the end. After this, we have yet to wait for #2106 to be merged first.

@BenjaminBossan Sorry, I don't understand, what do you mean with a white space at the end?

Zeju1997 · 2024-09-27T12:47:06Z

@BenjaminBossan So locally, I ran the pytest tests/ -k test_encoder_decoder_models and pytest tests/ -k test_decoder_models and I did not get any error.

I do get some error with pytest tests/ -k test_feature_extraction_models:
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_27_test_hf_internal_testing_tiny_random_DebertaModel_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_31_test_hf_internal_testing_tiny_random_DebertaV2Model_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

BenjaminBossan · 2024-09-27T14:44:11Z

I do get some error with pytest tests/ -k test_feature_extraction_models

Yeah, I can reproduce that, strange. Maybe it's related to the special layer types that Deberta uses.. I think we can skip this specific Deberta test, similar to how we skip AdaLoRA + Roberta:

peft/tests/testing_common.py

Lines 1105 to 1107 in c29810b

 if (config_cls == AdaLoraConfig) and ("roberta" in model_id.lower()): 

 # TODO: no gradients on the "dense" layer, other layers work, not sure why 

 self.skipTest("AdaLora with RoBERTa does not work correctly")

Please also merge with/rebase on the latest main from PEFT.

Zeju1997 · 2024-09-27T16:24:58Z

@BenjaminBossan I fixed the error in test_feature_extraction_models and fetched the latest main. Best.

BenjaminBossan · 2024-09-27T17:15:11Z

Hmm, it seems like test_deeply_nested is flaky now. Let me investigate a bit next week.

Zeju1997 · 2024-09-27T17:30:22Z

@BenjaminBossan Thank you so much! I tested the test_deeply_nested, so locally there was no error.

BenjaminBossan · 2024-09-30T10:32:15Z

@Zeju1997 I can confirm that test_deeply_nested works locally, both on GPU and CPU. Therefore, I'm not sure what's causing the error on CI. Checking the error message, the tensors look identical, so it's most likely a precision problem. Could you please increase the tolerance in this test to 1e-4?

Zeju1997 · 2024-09-30T10:36:59Z

@BenjaminBossan Thanks! I have updated.

BenjaminBossan · 2024-09-30T11:22:48Z

Thanks, but unfortunately, this has not solved the issue. I tested on another Linux PC with Intel CPU and still can't reproduce the issue. At this point, I don't know what else can be done, it's most likely some problem with the hardware of the GitHub CI runners. I would propose to skip this test on Linux for now:

...
import platform
...

# in test
        if platform.system() == "Linux":
            self.skipTest("This test fails but only on GitHub CI with Linux systems.")

If you add this, you can also do revert the tolerance changes, since they didn't help.

Zeju1997 · 2024-09-30T12:42:37Z

@BenjaminBossan Updated.

Zeju and others added 8 commits August 7, 2024 15:54

updating oft

8e48df0

update oft to be consistent with other peft methods

42b72e6

Merge remote-tracking branch 'upstream/main' into oft-new2

6ade80f

update oft to be consistent with other peft methods

ada7c69

update oft to be consistent with other peft methods

b7f5b23

update oft to fix merge bugs, be consistent with other peft methods

3d9a4bc

update oft to fix merge bugs, be consistent with other peft methods

2bca964

update oft to fix merge bugs, be consistent with other peft methods

76aba70

Zeju1997 changed the title ~~Update oft to fix merge bugs~~ Update OFT to fix merge bugs Aug 8, 2024

Merge remote-tracking branch 'upstream/main' into oft-new2

7b9a1af

BenjaminBossan requested changes Aug 9, 2024

View reviewed changes

Zeju added 2 commits August 9, 2024 15:31

addressing issues in config file

7c116b5

update oft config to be compatible with previous peft 0.12.1.dev0 ver…

3297ed6

…sion checkpoints / loading checkpoint successful

BenjaminBossan requested changes Aug 13, 2024

View reviewed changes

Zeju added 2 commits August 13, 2024 13:06

update according to review

fd3a7a2

run make style

27fc08c

BenjaminBossan reviewed Aug 13, 2024

View reviewed changes

Zeju and others added 2 commits August 20, 2024 14:44

Merge remote-tracking branch 'upstream/main' into oft-new2

ddac2b1

added check_kwargs

b4637a3

BenjaminBossan mentioned this pull request Aug 26, 2024

ENH Make PEFT configs forward compatible #2038

Draft

BenjaminBossan requested changes Aug 26, 2024

View reviewed changes

update version check

e62bcfd

BenjaminBossan requested changes Aug 28, 2024

View reviewed changes

update to fix the mixed peft errors

eb9887f

BenjaminBossan requested changes Sep 26, 2024

View reviewed changes

update to fix the mixed peft errors

2046baa

BenjaminBossan reviewed Sep 26, 2024

View reviewed changes

remove oft from mixed peft

3838bdb

Zeju added 2 commits September 26, 2024 17:55

remove oft from mixed peft + make quality

bd7a4db

resolve make test errors

2b105e3

BenjaminBossan requested changes Sep 27, 2024

View reviewed changes

Zeju added 2 commits September 27, 2024 13:05

update to resolve make test errors

b794269

update oft config

65ce38b

modify to resolve issues with make test

92328c1

Zeju added 3 commits September 27, 2024 18:18

update to solve test_feature_extraction_models

e601692

Merge remote-tracking branch 'upstream/main' into oft-new2

b13410c

fetch upstream and make style

d6fc326

updat test_deeply_nested precision

c591015

skip test_deeply_nested for remote check

097b2f8

	if self.boft_block_size == 0 and self.boft_block_num == 0:
	raise ValueError("You must specify either boft_block_size or boft_block_num.")
	if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):
	raise ValueError(
	f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "
	"but not both simultaneously, because boft_block_size x boft_block_num != in_features."
	)

	Implements the multiplicative dropout layer for BOFT.
	Implements the multiplicative dropout layer for OFT.

	("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
	("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

	'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
	'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

	'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
	'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

Update OFT to fix merge bugs #1996

Are you sure you want to change the base?

Update OFT to fix merge bugs #1996

Conversation

Zeju1997 commented Aug 8, 2024

changes for BOFT

changes for OFT

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Aug 9, 2024 • edited Loading

BenjaminBossan commented Aug 9, 2024

Zeju1997 commented Aug 9, 2024

HuggingFaceDocBuilderDev commented Aug 13, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Aug 13, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Aug 26, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Aug 27, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Sep 26, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Sep 26, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Sep 27, 2024

BenjaminBossan commented Sep 27, 2024

Zeju1997 commented Sep 27, 2024

Zeju1997 commented Sep 27, 2024

BenjaminBossan commented Sep 27, 2024

Zeju1997 commented Sep 27, 2024

BenjaminBossan commented Sep 27, 2024

Zeju1997 commented Sep 27, 2024

BenjaminBossan commented Sep 30, 2024

Zeju1997 commented Sep 30, 2024

BenjaminBossan commented Sep 30, 2024

Zeju1997 commented Sep 30, 2024

Zeju1997 commented Aug 9, 2024 •

edited

Loading