Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OFT to fix merge bugs #1996

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Conversation

Zeju1997
Copy link
Contributor

@Zeju1997 Zeju1997 commented Aug 8, 2024

changes for BOFT

  • fixing small errors in comments of BOFT
  • making Conv2D operation consistent with Linear

changes for OFT

  • [Important!] fixing the incorrect merging of OFT, it is not an implementation error, but more of an incorrect understanding of how orthogonal fine-tuning works, resulting in a wrong way of merging adapters
  • update config.py/layer.py/model.py file to be consistent with other peft methods
  • adding additional paramters oft_block_size, oft_dropout, bias
  • fixing module dropout implementation error (in orthogonal finetuning, dropout is not the same as LoRA)
  • fixing errors in merge and unmerge function in OFT
  • making Conv2D operation consistent with Linear

@Zeju1997 Zeju1997 changed the title Update oft to fix merge bugs Update OFT to fix merge bugs Aug 8, 2024
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing the issues you mentioned with OFT. I don't understand all the details but in general, this is to bring OFT closer to the BOFT implementation, is that right? Is that also why you decided to no longer use LycorisTuner as a base?

Did you check whether after these changes, the same code using OFT would still produce the exact same results? E.g. let's say a user has trained an OFT model checkpoint in PEFT 0.12. Now we merge this PR and release it in PEFT 0.13. If the user upgrades to 0.13 and loads the OFT checkpoint, would they still get the same results? This is really important because otherwise it would be a source of bugs.

If we cannot ensure the same results after this PR because, for instance, something was not implemented correctly previously, at the very least we have to add some mechanism to error or warn users when they try to load an old checkpoint with new PEFT versions.

I have added some smaller comments too, please check them out. On top of those, please run make style and ensure 120 character line limits (e.g. long strings).

Also, pinging @okotaku and @lukaskuhn-lku since they worked on the original OFT PR.

Comment on lines 318 to 322
elif boft_block_size != 0 and boft_block_num != 0:
raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")

else:
raise ValueError(
f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
"to be 0, because boft_block_size x boft_block_num != in_features."
)
raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these checks? We already have checks here:

if self.boft_block_size == 0 and self.boft_block_num == 0:
raise ValueError("You must specify either boft_block_size or boft_block_num.")
if not (self.boft_block_size != 0) ^ (self.boft_block_num != 0):
raise ValueError(
f"You can only specify either boft_block_size ({self.boft_block_size}) or boft_block_num ({self.boft_block_num}), "
"but not both simultaneously, because boft_block_size x boft_block_num != in_features."
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I forgot there is already a check in the config.py Updated.

Comment on lines 750 to 754
elif boft_block_size != 0 and boft_block_num != 0:
raise ValueError(f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously.")

else:
raise ValueError("Unknown error!")
raise ValueError(f"Either `boft_block_size` or `boft_block_num` must be non-zero. Currently, boft_block_size = {boft_block_size} and boft_block_num = {boft_block_num}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

class OFTLayer(nn.Module, LycorisLayer):
class MultiplicativeDropoutLayer(nn.Module):
"""
Implements the multiplicative dropout layer for BOFT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implements the multiplicative dropout layer for BOFT.
Implements the multiplicative dropout layer for OFT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

if D == 1:
return x

# Create a mask with 1s for matrices to be replaced with identity and 0s otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence doesn't quite make sense, could you please check again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

src/peft/tuners/oft/layer.py Show resolved Hide resolved
src/peft/tuners/oft/layer.py Show resolved Hide resolved
src/peft/tuners/oft/layer.py Show resolved Hide resolved
@Zeju1997
Copy link
Contributor Author

Zeju1997 commented Aug 9, 2024

@BenjaminBossan Hi, thank you so much. Because the main change (beside changing the format) is in the merge/unmerge, it should be possible to update it without affecting previous trained model checkpoints. I am currently testing it. But I am currently facing a problem:

import transformers
import accelerate
import peft
from datasets import load_dataset
from transformers import AutoImageProcessor
from peft import LoraConfig, BOFTConfig, OFTConfig, get_peft_model
import numpy as np
import evaluate
import torch
import os

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from peft import PeftConfig, PeftModel, AutoPeftModel

os.environ["HF_API_TOKEN"] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

if __name__ == '__main__':
    dataset = load_dataset("food101", split="train[:5000]")

    output_dir = "output"

    model_checkpoint = "google/vit-base-patch16-224-in21k"

    labels = dataset.features["label"].names
    label2id, id2label = dict(), dict()
    for i, label in enumerate(labels):
        label2id[label] = i
        id2label[i] = label

    image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)


    normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
    train_transforms = Compose(
        [
            RandomResizedCrop(image_processor.size["height"]),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

    val_transforms = Compose(
        [
            Resize(image_processor.size["height"]),
            CenterCrop(image_processor.size["height"]),
            ToTensor(),
            normalize,
        ]
    )


    def preprocess_train(example_batch):
        """Apply train_transforms across a batch."""
        example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    def preprocess_val(example_batch):
        """Apply val_transforms across a batch."""
        example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
        return example_batch


    splits = dataset.train_test_split(test_size=0.1)
    train_ds = splits["train"]
    val_ds = splits["test"]

    train_ds.set_transform(preprocess_train)
    val_ds.set_transform(preprocess_val)


    def print_trainable_parameters(model):
        trainable_params = 0
        all_param = 0
        for _, param in model.named_parameters():
            all_param += param.numel()
            if param.requires_grad:
                trainable_params += param.numel()
        print(
            f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
        )

    model = AutoModelForImageClassification.from_pretrained(
        model_checkpoint,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )


    print_trainable_parameters(model)

    config = LoraConfig(
        r=32,
        lora_alpha=16,
        target_modules=["query", "value"],
        lora_dropout=0.1,
        bias="none",
        # modules_to_save=["classifier"],
    )

    lora_model = get_peft_model(model, config)
    print_trainable_parameters(lora_model)

    model_name = model_checkpoint.split("/")[-1]
    batch_size = 128

    args = TrainingArguments(
        f"{model_name}-finetuned-lora-food101",
        remove_unused_columns=False,
        evaluation_strategy="epoch",
        save_strategy="no",
        learning_rate=5e-3,
        per_device_train_batch_size=batch_size,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=batch_size,
        fp16=True,
        num_train_epochs=5,
        logging_steps=10,
        metric_for_best_model="accuracy",
        label_names=["labels"],
        eval_strategy="steps",
        eval_steps=100000000000,
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        """Computes accuracy on a batch of predictions"""
        predictions = np.argmax(eval_pred.predictions, axis=1)
        return metric.compute(predictions=predictions, references=eval_pred.label_ids)

    def collate_fn(examples):
        pixel_values = torch.stack([example["pixel_values"] for example in examples])
        labels = torch.tensor([example["label"] for example in examples])
        return {"pixel_values": pixel_values, "labels": labels}

    trainer = Trainer(
        lora_model,
        args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        tokenizer=image_processor,
        compute_metrics=compute_metrics,
        data_collator=collate_fn,
    )
    eval_results = trainer.evaluate(val_ds)
    print('Before training', eval_results)

    repo_name = f"sgp-bench/peft-finetuned-lora-food101"

    train_results = trainer.train()
    
    token = os.getenv("HF_API_TOKEN")
    lora_model.push_to_hub(repo_name, token=token)

    eval_results = trainer.evaluate(val_ds)
    print('After training', eval_results)

    config = PeftConfig.from_pretrained(repo_name)
    model = AutoModelForImageClassification.from_pretrained(
        config.base_model_name_or_path,
        label2id=label2id,
        id2label=id2label,
        ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
    )
    # Load the LoRA model
    # inference_model = AutoPeftModel.from_pretrained(repo_name).to("cuda")
    trainer.model = PeftModel.from_pretrained(model, repo_name).to("cuda")
    eval_results = trainer.evaluate(val_ds)
    print('Load adapter weight and inference', eval_results)

I am currently debugging with LoRA and the official guide from doc.
I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?
If I do merge_and_unload(), the eval_accuracy is the same as after training.
Is there something wrong with the way I am loading adapter weights?

@BenjaminBossan
Copy link
Member

I noticed that the adapter weight can be loaded, but the eval_accuracy is basically 0 (compared to the eval_accuracy after training of over 90%), indicating the weights are not loaded correctly?

I could reproduce your issue of low accuracy (I did not train, just loaded your checkpoint). I think the issue is this line, right?

# modules_to_save=["classifier"]

For this reason, the classifier layer is randomly initialized, so the accuracy is low.

If I do merge_and_unload(), the eval_accuracy is the same as after training.

Could you show me which steps you take to achieve this?

@Zeju1997
Copy link
Contributor Author

Zeju1997 commented Aug 9, 2024

Ah, ok, you are right, in the old checkpoint with merge_and_unload I did have the classifier saved.

It should be working now, I have tested it can load an old checkpoint using previous version of PEFT OFT.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success

In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

f"You can only specify either boft_block_size ({boft_block_size}) or boft_block_num ({boft_block_num}), but not both simultaneously or setting both"
"to be 0, because boft_block_size x boft_block_num != in_features."
)
raise ValueError("Unknown error!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code should not be reachable. Still, let's use a more helpful error message. My proposal:

"Something went wrong, please report this error: https://github.com/huggingface/peft/issues".

src/peft/tuners/boft/layer.py Show resolved Hide resolved
# OFT info
self.oft_r = nn.ParameterDict({})
self.oft_s = nn.ParameterDict({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we introduce a new parameter oft_s. All existing checkpoints don't have this parameter, so that means when a user loads an old checkpoint, this parameter is randomly initialized, which means the checkpoint behaves different. See my longer comment on compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it depends on the init_weights, if init_weights=True (which should be the default behavior), then oft_s is basically like an identity and should not make changes. If init_weights=False, then it is randomly initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

Comment on lines 179 to 182
elif r != 0 and oft_block_size != 0:
raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
else:
raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should not be reachable because of the checks we have in the config class, but IMO it's fine to leave them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 506 to 509
elif r != 0 and oft_block_size != 0:
raise ValueError(f"You can only specify either r ({r}) or oft_block_size ({oft_block_size}), but not both simultaneously.")
else:
raise ValueError(f"Either `r` or `oft_block_size` must be non-zero. Currently, r = {r} and oft_block_size = {oft_block_size}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier: Should be unreachable but it's fine if we leave the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"]}),
("Vanilla MLP 5 OFT", "MLP", OFTConfig, {"target_modules": ["lin0"], "modules_to_save": ["lin1"]}),
("Vanilla MLP 1 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": "lin0"}),
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "r": 2}),
("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@Zeju1997
Copy link
Contributor Author

Thanks a lot for the updates. Could you please run make style so that CI can pass successfully?

I did some further tests regarding the backwards compatibility of the changes in this PR. If my testing is correct, this PR works without a problem for BOFT but it will cause an issue with OFT, namely that old OFT checkpoints will not work anymore after this PR. They will load fine but the result will be different, which is bad, as users will not expected this. We need to either ensure that the old checkpoint works as expected, or we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version.

Here is how I tested the compatibility:

# try loading old oft and boft checkpoints
# First checkout and pull all branches to be tested. Then call like this:
# git checkout <branch 1>  # e.g. main
# python script.py oft     # either oft or boft
# git checkout <branch 2>  # e.g. PR branch
# python script.py oft     # same method as above
# call `rm -r /tmp/peft/` to try again with different branches

import os
import sys

import torch
from transformers import AutoModelForCausalLM
from peft import BOFTConfig, OFTConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
inputs = torch.arange(5).view(-1, 1)

# oft
def main(method):
    if method.lower() == "oft":
        config_cls = OFTConfig
        path = "/tmp/peft/oft"
    else:
        config_cls = BOFTConfig
        path = "/tmp/peft/boft"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    if not os.path.exists(path + "/adapter_model.safetensors"):
        print(f"{method} adapter does not exist yet, creating it")
        peft_config = config_cls(target_modules=["q_proj", "v_proj"], init_weights=False)
        model = get_peft_model(model, peft_config)
        model.save_pretrained(path)

        model.eval()
        with torch.inference_mode():
            outputs = model(inputs).logits
        torch.save(outputs, path + "/output.pt")
        print(f"{method} adapter and output created saved in {path}")
    else:
        print(f"{method} adapter and outpput exist, loading it from {path}")
        model = PeftModel.from_pretrained(model, path)
        print(f"{method} adapter loaded successfully")
        model.eval()
        with torch.inference_mode():
            outputs_new = model(inputs).logits

        outputs_old = torch.load(path + "/output.pt", weights_only=True)
        assert torch.allclose(outputs_old, outputs_new, atol=1e-4, rtol=1e-4)
        print(f"{method} outputs match")


if __name__ == "__main__":
    method = sys.argv[1] if len(sys.argv) > 1 else "boft"
    assert method.lower() in ["boft", "oft"]
    main(method)

Does that make sense to you? I summarized the results from my tests in this table:

branch 1 branch 2 method status
main PR OFT fail: outputs different
PR main OFT TypeError: OFTConfig.__init__() got an unexpected keyword argument 'bias'
main PR BOFT success
PR main BOFT success
In general, we really want to retain backwards compatibility but in my book it's acceptable if the previous implementation was not correct and is now fixed. Also note that a new OFT checkpoint created after this PR cannot be loaded with older PEFT versions -- there is no forward compatibility -- but this is okay.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be expected, the reason is that the load is not correct in the previous version. I think the best way is to do "we have to add a check and if we detect an old checkpoint, we have to raise an error and tell the user that they either need to re-train their OFT checkpoint or use an older PEFT version." this. Where do you suggest to add the check?

Good questions. I checked what would be the best place to change this, keeping in mind that in the future, we could have more of these cases.

My suggestion is the following. In this line, we load all the arguments for the config:

kwargs = {**class_kwargs, **loaded_attributes}

Here, we could add a check on these arguments that could be used to raise an error if an old OFT config is detected. So the new code would be:

        loaded_attributes = cls.from_json_file(config_file)
        kwargs = {**class_kwargs, **loaded_attributes}
        kwargs = self.check_kwargs(**kwargs)  # <= new line
        return cls.from_peft_type(**kwargs)

By default, this method should be defined on PeftConfigMixin to just return the kwargs without any validation or change. Then on OFTConfig, we can overrride this method to check if it's from an old OFT checkpoint and raise an error.

To check if the config is from an old checkpoint, we could simply check for the presence of one of the new arguments added in this PR. WDYT?

Maybe long term it would be better to also store the PEFT version in the adapter_config.json so that this can be checked instead. But it's not quite easy, as existing checkpoints don't have this info yet. So let's not add that in the current PR.

# OFT info
self.oft_r = nn.ParameterDict({})
self.oft_s = nn.ParameterDict({})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but when users load checkpoints, they have trained the model, so oft_s won't be an identity transform.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Sorry for the late reply, I added the check_kwargs, do you think it is ok like that? As for the oft_s, if the user used an older version, it would be not existent, therefore, even if it is trained, it will still be an identity transform, therefore should make no difference.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these latest changes.

Concerning the compatibility check, I have a few comments, please check.

Also, I think it would be good to have a unit test for this check. I created a checkpoint with the old OFT implementation here: https://huggingface.co/peft-internal-testing/OFT-tiny-random-OPTForCausalLM. You can use it to load and then check that the warning has been raised.

src/peft/config.py Show resolved Hide resolved
"""
if "oft_block_size" in kwargs:
warnings.warn(
'OFT has been updated since 0.12.1.dev0. Your trained adapter weights may not be compatible with the latest version of OFT. Please retrain your adapter weights.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Note that dev0 is not a real version, so let's not refer to that. The next release version will be 0.13.0, so let' use that version.
  2. "may not be compatible": We are pretty sure it is incompatible when trained, right? Let's phrase it as "is incompatible".
  3. Let's also mention that users can downgrade PEFT to version 0.12.0 and then the adapter will still work.

kwargs (additional keyword arguments, *optional*):
Additional keyword arguments passed along to the child class initialization.
"""
if "oft_block_size" in kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so oft_block_size is the new parameter, right? So if we're loading an old OFT model, it should be missing. Therefore, should the check not be if "oft_block_size" not in kwargs?

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan I noticed some other problems in the previous implementation of OFT, the size of the OFT adapters are also defined incorrectly (OFT should rotate the neurons, not the feature dimensions, the current way could have some fine-tuning effects, but is not optimal). Therefore, I added the raise ValueError in the OFTConfig, because it is now not possible to load the old adapter weights. I would suggest the user either use the old version of PEFT or retrain the adapter weights entirely. Best.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making further corrections to OFT. I have a few smaller comments and also a question about the parameter finding algorithm, please take a look.

Also, please ensure to run make style on your code changes.

"""
if "oft_block_size" not in kwargs:
raise ValueError(
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'

Also, let's ensure the 120 char line limit.

if "oft_block_size" not in kwargs:
raise ValueError(
'OFT has been updated since PEFT 0.13.0. Your trained adapter weights is incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions.'
'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'Downgrade PEFT to version 0.12.0 to merge the old adapter weights.'
'Alternatively, downgrade PEFT to version 0.12.0 to use the old adapter weights.'

Not specific to merging, right?

Comment on lines 173 to 175
warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
oft_block_size = self.adjust_oft_parameters(self.in_features, oft_block_size)
warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of two warnings, let's make this one warning by prepending the first message to the second.

if self.in_features % r != 0 or r > self.in_features:
warnings.warn(f"Invalid `r` ({r})!")
r = self.adjust_oft_parameters(self.in_features, r)
warnings.warn(f"Adjusted `r` to ({r}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about having a single warning.

src/peft/tuners/oft/layer.py Show resolved Hide resolved
if conv_filter_dim % oft_block_size != 0 or oft_block_size > conv_filter_dim:
warnings.warn(f"Invalid `oft_block_size` ({oft_block_size})!")
oft_block_size = self.adjust_oft_parameters(conv_filter_dim, oft_block_size)
warnings.warn(f"Adjusted `oft_block_size` to ({oft_block_size}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about having a single warning.

if conv_filter_dim % r != 0 or r > conv_filter_dim:
warnings.warn(f"Invalid `r` ({r})!")
r = self.adjust_oft_parameters(conv_filter_dim, r)
warnings.warn(f"Adjusted `r` to ({r}).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about having a single warning.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Thanks, I have updated the mixed peft test.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks. Just a small issue with one of the tests, the rest looks good.

torch.manual_seed(4)
config4 = OFTConfig(r=2, task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"], init_weights=False)
peft_model.add_adapter("adapter4", config4)
peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3", "adapter4"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this line in, just without "adapter4", i.e.: peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3"]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan Could you check again, please? Best.

Comment on lines 751 to 753
torch.manual_seed(4)
config4 = OFTConfig(r=2, task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"], init_weights=False)
peft_model.add_adapter("adapter4", config4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines should be removed. We only want to keep the peft_model.set_adapter(["adapter0", "adapter1", "adapter2", "adapter3"]) part. Also don't forget to remove the OFTConfig import, as it should not be needed anymore.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Sure, updated.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the failing tests. We need to change a few things to make them pass:

  1. The config tests actually don't make a lot of sense as they are. For now, simply remove OFT here.
  2. We need to ensure for a few tests that OFT weights are not initialized as identity transform, same as we already do with BOFT. So each time in test_decoder_models.py, test_encoder_decoder_models.py, and test_feature_extraction.py, when there is the line "boft_kwargs": {"init_weights": [False]}, also add "oft_kwargs": {"init_weights": [False]}.
  3. OFT no longer supports Conv1D layers. Either this support needs to be added back in or GPT2 tests need to be skipped, same as for BOFT (see here).
  4. This line needs to add OFT.
  5. Also add OFT here, same as BOFT.

I think that covers all, but maybe I missed something. The complete diff I got locally is:

modified   tests/test_config.py
@@ -34,7 +34,6 @@ from peft import (
     LoKrConfig,
     LoraConfig,
     MultitaskPromptTuningConfig,
-    OFTConfig,
     PeftConfig,
     PeftType,
     PolyConfig,
@@ -61,7 +60,6 @@ ALL_CONFIG_CLASSES = (
     LoKrConfig,
     LoraConfig,
     MultitaskPromptTuningConfig,
-    OFTConfig,
     PolyConfig,
     PrefixTuningConfig,
     PromptEncoderConfig,
modified   tests/test_decoder_models.py
@@ -22,6 +22,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import (
     AdaLoraConfig,
     BOFTConfig,
+    OFTConfig,
     HRAConfig,
     LoraConfig,
     PrefixTuningConfig,
@@ -59,7 +60,7 @@ def skip_boft_or_hra_and_gpt2(test_list):
     return [
         test
         for test in test_list
-        if not (("GPT2LMHeadModel" in test[1]) and ((test[2] == BOFTConfig) or (test[2] == HRAConfig)))
+        if not (("GPT2LMHeadModel" in test[1]) and ((test[2] == BOFTConfig) or (test[2] == OFTConfig) or (test[2] == HRAConfig)))
     ]
 
 
@@ -205,6 +206,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -222,6 +224,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "lora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -336,6 +339,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
@@ -373,6 +377,7 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "fourierft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
modified   tests/test_encoder_decoder_models.py
@@ -173,6 +173,7 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "SEQ_2_SEQ_LM",
@@ -207,6 +208,7 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "SEQ_2_SEQ_LM",
modified   tests/test_feature_extraction_models.py
@@ -111,6 +111,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
@@ -164,6 +165,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "adalora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "vera_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
@@ -180,6 +182,7 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
                 "lora_kwargs": {"init_lora_weights": [False]},
                 "ia3_kwargs": {"init_ia3_weights": [False]},
                 "boft_kwargs": {"init_weights": [False]},
+                "oft_kwargs": {"init_weights": [False]},
                 "hra_kwargs": {"init_weights": [False]},
                 "task_type": "FEATURE_EXTRACTION",
             },
modified   tests/testing_common.py
@@ -652,7 +652,7 @@ class PeftCommonTester:
         if issubclass(config_cls, PromptLearningConfig):
             return pytest.skip(f"Test not applicable for {config_cls}")
 
-        if issubclass(config_cls, BOFTConfig):
+        if issubclass(config_cls, (OFTConfig, BOFTConfig)):
             return pytest.skip(f"Test not applicable for {config_cls}")
 
         if ("gpt2" in model_id.lower()) and (config_cls != LoraConfig):
@@ -1290,7 +1290,7 @@ class PeftCommonTester:
         model = get_peft_model(model, config)
         model = model.to(self.torch_device)
 
-        if config.peft_type not in ("LORA", "ADALORA", "IA3", "BOFT", "VERA", "FOURIERFT", "HRA", "VBLORA"):
+        if config.peft_type not in ("LORA", "ADALORA", "IA3", "BOFT", "OFT", "VERA", "FOURIERFT", "HRA", "VBLORA"):
             with pytest.raises(AttributeError):
                 model = model.unload()
         else:

There are a few more failing that are unrelated to this PR. We need #2106 merged first before these tests can pass.

"""
if "oft_block_size" not in kwargs:
raise ValueError(
"OFT has been updated since PEFT 0.13.0. Your trained adapter weights are incompatible with the latest version of OFT. Please retrain your adapter weights with newer PEFT versions."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this long line to be 120 chars at most. Also, there is a space missing at the end of the sentence.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Thank you so much for the detailed suggestions. I have updated the test files. Best.

@BenjaminBossan
Copy link
Member

Thanks for the quick action. You missed a few changes still in test_decoder_models.py, test_encoder_decoder_models.py, and this line is missing a space at the end. After this, we have yet to wait for #2106 to be merged first.

@Zeju1997
Copy link
Contributor Author

Thanks for the quick action. You missed a few changes still in test_decoder_models.py, test_encoder_decoder_models.py, and this line is missing a space at the end. After this, we have yet to wait for #2106 to be merged first.

@BenjaminBossan Sorry, I don't understand, what do you mean with a white space at the end?

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan So locally, I ran the pytest tests/ -k test_encoder_decoder_models and pytest tests/ -k test_decoder_models and I did not get any error.

I do get some error with pytest tests/ -k test_feature_extraction_models:
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_27_test_hf_internal_testing_tiny_random_DebertaModel_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
FAILED tests/test_feature_extraction_models.py::PeftFeatureExtractionModelTester::test_training_gradient_checkpointing_31_test_hf_internal_testing_tiny_random_DebertaV2Model_oft - RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

@BenjaminBossan
Copy link
Member

I do get some error with pytest tests/ -k test_feature_extraction_models

Yeah, I can reproduce that, strange. Maybe it's related to the special layer types that Deberta uses.. I think we can skip this specific Deberta test, similar to how we skip AdaLoRA + Roberta:

peft/tests/testing_common.py

Lines 1105 to 1107 in c29810b

if (config_cls == AdaLoraConfig) and ("roberta" in model_id.lower()):
# TODO: no gradients on the "dense" layer, other layers work, not sure why
self.skipTest("AdaLora with RoBERTa does not work correctly")

Please also merge with/rebase on the latest main from PEFT.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan I fixed the error in test_feature_extraction_models and fetched the latest main. Best.

@BenjaminBossan
Copy link
Member

Hmm, it seems like test_deeply_nested is flaky now. Let me investigate a bit next week.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Thank you so much! I tested the test_deeply_nested, so locally there was no error.

@BenjaminBossan
Copy link
Member

@Zeju1997 I can confirm that test_deeply_nested works locally, both on GPU and CPU. Therefore, I'm not sure what's causing the error on CI. Checking the error message, the tensors look identical, so it's most likely a precision problem. Could you please increase the tolerance in this test to 1e-4?

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Thanks! I have updated.

@BenjaminBossan
Copy link
Member

Thanks, but unfortunately, this has not solved the issue. I tested on another Linux PC with Intel CPU and still can't reproduce the issue. At this point, I don't know what else can be done, it's most likely some problem with the hardware of the GitHub CI runners. I would propose to skip this test on Linux for now:

...
import platform
...

# in test
        if platform.system() == "Linux":
            self.skipTest("This test fails but only on GitHub CI with Linux systems.")

If you add this, you can also do revert the tolerance changes, since they didn't help.

@Zeju1997
Copy link
Contributor Author

@BenjaminBossan Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants