MobileViT does not work with Inference with different LoRA adapters in the same batch #1967

saeid93 · 2024-07-29T12:53:39Z

System Info

Python 3.11.9
transformers==4.40.2
peft==0.11.2

Who can help?

@BenjaminBossan

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

MobileVit model is not compatible with using multiple adapters in the same batch. Inferencing batches using multiple adapters using the adapter_names in the batch will trigger the following exception:

peft/src/peft/tuners/lora/layer.py

Line 308 in 273acf0

if len(x) != len(adapter_names):

The root cause is that during the unfolding operation in the transformers library MobileVit the first dimension of the input is changed from batch_size, ... is changed to batch_size * patch_size**2, ... which makes it inconsistent with the adapter_names dimensions which is of length of batch_size and each entry refers to each of the batch items' adapter.

Expected behavior

I solved this by a hack that modifies the adapter_names input size before sending it to the model and reverting it back to the original size for the classifier. It makes the entries proportional to the size made during the unfolding operation.

Also, we already discussed that there is a bug #1960 other than this MobileViT specific problem. Below script is the modifications needed both for #1960 and the mentioned problem together.

However, this is just a hack and I think this should work out of the box. I'm happy to investigate further when I get a chance to first solve #1960 .

# -------- changing the size of the adapter_names input ----------
        if model.base_model.model.base_model_prefix == "mobilevit":
            patch_size = model.config.patch_size
            multiply = patch_size ** 2
            resized_adapters_names = []
            for item in batch["adapter_names"]:
                multiplied = [item] * multiply
                resized_adapters_names += multiplied
            batch["adapter_names"] = resized_adapters_names
        outputs = model(**batch)

# -------- rest of the code ----------

"""
added this to solve https://github.com/huggingface/peft/issues/1960
"""

from typing import Any, Optional, Union
import torch
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from peft.peft_model import PeftModel
from transformers.modeling_outputs import ImageClassifierOutput, ImageClassifierOutputWithNoAttention
from transformers import ViTForImageClassification, MobileViTForImageClassification
from functools import partial

class PeftModelFixed(PeftModel):
    def forward(self, *args: Any, **kwargs: Any):
        """
        Forward pass of the model.
        """
        with self._enable_peft_forward_hooks(*args, **kwargs):
            # TODO removed this to avoid mixing
            # kwargs = {k: v for k, v in kwargs.items() if k not in self.special_peft_forward_args}
            return self.get_base_model()(*args, **kwargs)

class MobileViTForImageClassificationFixed(MobileViTForImageClassification):
    def forward(
        self,
        pixel_values: Optional[torch.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        labels: Optional[torch.Tensor] = None,
        return_dict: Optional[bool] = None,
        **kwargs # TODO added kwargs
    ) -> Union[tuple, ImageClassifierOutputWithNoAttention]:
        r"""
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss). If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # TODO here
        outputs = self.mobilevit(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        # TODO here
        adapter_names = kwargs["adapter_names"]
        patch_size = self.config.patch_size
        multiply = patch_size ** 2
        adapter_names_original = []
        for i in range(0, len(adapter_names), multiply):
            adapter_names_original.append(adapter_names[i])
        logits = self.classifier(self.dropout(pooled_output), adapter_names=adapter_names_original)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutputWithNoAttention(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
        )



def peftforward(self, *args, **kwargs):
    if self.disable_adapters or (self.active_adapter not in self.modules_to_save):
        return self.original_module(*args, **kwargs)

    # TODO changed to support LoRA
    adapter_names = kwargs["adapter_names"]
    kwargs = {}
    batch = args[0]
    unique_adapters = set(adapter_names)
    sub_batch_indices_list = []
    for adapter in unique_adapters:
        sub_batch_indices_list.append([index for index, item in enumerate(adapter_names) if item == adapter])

    results = [0 for i in range(len(batch))]
    for i, active_adapter in enumerate(unique_adapters):
        sub_batch = batch[sub_batch_indices_list[i]]
        output = self.modules_to_save[active_adapter](*(sub_batch,), **kwargs)
        for index, j in enumerate(sub_batch_indices_list[i]):
            results[j] = output[index]
    return torch.stack(results)

def change_forward_dynamically(model: PeftModel):
    model.classifier.forward = partial(peftforward, model.classifier)
    return model
        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


class MobileViTForImageClassificationFixed(MobileViTForImageClassification):
    def forward(
        self,
        pixel_values: Optional[torch.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        labels: Optional[torch.Tensor] = None,
        return_dict: Optional[bool] = None,
        **kwargs # TODO added kwargs
    ) -> Union[tuple, ImageClassifierOutputWithNoAttention]:
        r"""
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss). If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # TODO here
        outputs = self.mobilevit(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        # TODO here
        adapter_names = kwargs["adapter_names"]
        patch_size = self.config.patch_size
        multiply = patch_size ** 2
        adapter_names_original = []
        for i in range(0, len(adapter_names), multiply):
            adapter_names_original.append(adapter_names[i])
        logits = self.classifier(self.dropout(pooled_output), adapter_names=adapter_names_original)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutputWithNoAttention(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
        )

def peftforward(self, *args, **kwargs):
    if self.disable_adapters or (self.active_adapter not in self.modules_to_save):
        return self.original_module(*args, **kwargs)

    # TODO changed to support LoRA
    adapter_names = kwargs["adapter_names"]
    kwargs = {}
    batch = args[0]
    unique_adapters = set(adapter_names)
    sub_batch_indices_list = []
    for adapter in unique_adapters:
        sub_batch_indices_list.append([index for index, item in enumerate(adapter_names) if item == adapter])

    results = [0 for i in range(len(batch))]
    for i, active_adapter in enumerate(unique_adapters):
        sub_batch = batch[sub_batch_indices_list[i]]
        output = self.modules_to_save[active_adapter](*(sub_batch,), **kwargs)
        for index, j in enumerate(sub_batch_indices_list[i]):
            results[j] = output[index]
    return torch.stack(results)

def change_forward_dynamically(model: PeftModel):
    model.classifier.forward = partial(peftforward, model.classifier)
    return model

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-07-29T15:18:59Z

However, this is just a hack and I think this should work out of the box. I'm happy to investigate further when I get a chance to first solve #1960 .

Thanks a lot again for this detailed analysis, and again I would be very happy to accept a PR to fix this. Regarding the question of how to fix this: I wonder if it would be easier to change the logic inside of _check_forward_args. Maybe we can relax the len(x) != len(adapter_names) check or even (re)move it entirely if that's enough to fix the situation.

saeid93 · 2024-07-29T18:15:24Z

No problem!
About your question, as far as I understand removing the check won't solve the problem since there will be a mismatch between input size and the number of items in the adapter_names. Therefore in the lora_layer the LoRA weights are only applied proportional to the number of items in adapter_names (same as the input batch size) which is fine in most models. However, in MobileVit due to the unfolding operation the dimension of the inputs is different from the number of the batch size, since adapter_names length matches the batch the LoRA is not applied to all the inputs.

I also checked the accuracy without multi LoRA inference and it seems that my above explanation can also be validated by looking at accuracies. Removing the check does not match the accuracy of single LoRA inference.

BenjaminBossan · 2024-07-30T09:13:32Z

I see, I thought it would be possible to remove the check or at least make it optional. The user then needs to ensure that the correct adapter_names are passed so that they are lined up with the way that MobileVit unfolds the 0th dimension. The changes to MobileViTForImageClassification would probably still be necessary (though the mixed batch feature is intended for inference only, not sure if that simplifies things). But I'm probably missing something. Anyway, a PR to fix the situation would be welcome.

github-actions · 2024-09-11T15:03:56Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

saeid93 · 2024-09-17T15:54:05Z

Hi @BenjaminBossan ,
My recent pull request will not solve this issue. That pull requests solved #1960 . This one is a MobileViT specific problem which I was thinking to look for a potential solution next. Please let me know if you think it is no longer necessary, I just want to let you know that it is not relevant to #1990.

BenjaminBossan · 2024-09-17T16:03:50Z

This issue was auto-closed by merging said PR because you wrote

I'm happy to go through models one by one and also fix #1967

and GH automatically parses "fix #XXXX" and closes the corresponding issue :)

Yes, let's leave this open. If you have time, it would be great if you could work on solving this issue here as well. We can further discuss potential solutions.

saeid93 · 2024-09-17T16:09:01Z

ah, I see :) Sure, I'll have a look when I get a chance.

saeid93 · 2024-10-05T17:50:19Z

Hi @BenjaminBossan,

I looked into this issue in more depth, but I'm still a bit unsure of the best way to implement a solution. I explored three different approaches, but each has its own challenges, which I've explained below. I would appreciate your opinion on these and any other solutions you might suggest.

Background

As mentioned above, the problem is that the unfolding operation changes the dimensions of the input in MobileViT. As a result, we need to scale the adapter_names in proportion to the patch size of the input:

# -------- Adjusting the size of the adapter_names input ----------
if model.base_model.model.base_model_prefix == "mobilevit":
    patch_size = model.config.patch_size
    multiply = patch_size ** 2
    resized_adapters_names = []
    for item in batch["adapter_names"]:
        multiplied = [item] * multiply
        resized_adapters_names += multiplied
    batch["adapter_names"] = resized_adapters_names
outputs = model(**batch)

Note that after the fixes in #1990, this solution will no longer work out of the box since the MobileViT part expects the modified format above, while the classifier part expects the original length for the adapter_names input.

Solution 1

Attempt to modify the code to change the adapter_names layer differently for the ViT part and the classifier part.

This solution aims to apply the same workaround I'm currently doing (subclassing MobileViT) but without subclassing, instead injecting the modified logic dynamically—similar to this approach—by using a pre-hook that can adjust adapter_names if the model type is mobilevit. The challenge I encountered was determining how much to scale adapter_names. We need access to the patch_size variable in a function like this one, which can then be added as a pre-hook. However, passing down the patch_size variable required substantial changes to the existing function signatures, which complicates this approach.

Solution 2

Rewrite how the PEFT library applies LoRA layers.

In this approach, I considered rewriting the SelfAttention layer of MobileViT to account for the size change when LoRA is applied, potentially by adding a dispatcher for MobileViT. However, this required significant changes to how LoRA layers are added, which could potentially disrupt other parts of the model.

Solution 3

Reimplement MobileViT with an inherited function in the PEFT library (similar to the workaround I used earlier, but with modifications to account for the fixes in #1990). The downside of this solution is that it involves adding special-case logic for a specific model type in the PEFT library, which feels overly hacky.

Please let me know if you have any suggestions for a better approach or any comments on the solutions discussed. I'm happy to proceed based on your recommendations.

BenjaminBossan · 2024-10-07T10:35:43Z

Thanks for digging deeper into this issue and thinking of a view possible solutions. As you discussed, each of them has its own drawbacks so it's not clear how to proceed.

Something that came to my mind is the following solution: Let's say we have n entries in adapter_names but len(x) is k * n. Could we just "broadcast" adapter_names to repeat each sample k times? Since this would be done on a per layer basis, this should hopefully not interfere with layers that don't need it. Of course, this is a bit "magic" and could potentially misfire when the two are just fitting by accident, but maybe we can live with that.

LMK what you think of this solution.

saeid93 · 2024-10-07T18:01:12Z

Thank you for the suggestion! I'll take a closer look when I get the chance. One quick question that comes to mind: since adapter_names is a list and doesn't support broadcasting, are you suggesting that everything should be converted to NumPy arrays before attempting this solution?

BenjaminBossan · 2024-10-08T08:58:22Z

No, I don't really mean broadcasting in the sense of numpy, hence why I wrote "broadcasting" :) What I mean is repeating the same items multiple times. Simplified code would be something like this:

adapter_names = ["a", "b", "a", "c"]
x = range(12)  # 3 times the size of adapter_names
quot, remainder = divmod(len(adapter_names), len(x))  # 3, 0
if remainder != 0:
    raise ...
adapter_names = sum([[i] * quot for i in adapter_names], [])
print(adapter_names)  # ['a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'c', 'c', 'c']

saeid93 · 2024-10-09T09:38:40Z

Thank you for your clarification, I'll work on it when I get a chance.

BenjaminBossan · 2024-10-09T10:10:48Z

Great, thanks. Of course I might be missing something and one of your proposals could make more sense.

github-actions · 2024-11-02T15:04:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

saeid93 mentioned this issue Aug 5, 2024

fixing multiple LoRA in the same batch or vit #1990

Merged

BenjaminBossan closed this as completed in #1990 Sep 17, 2024

BenjaminBossan reopened this Sep 17, 2024

github-actions bot closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MobileViT does not work with Inference with different LoRA adapters in the same batch #1967

MobileViT does not work with Inference with different LoRA adapters in the same batch #1967

saeid93 commented Jul 29, 2024

BenjaminBossan commented Jul 29, 2024

saeid93 commented Jul 29, 2024

BenjaminBossan commented Jul 30, 2024

github-actions bot commented Sep 11, 2024

saeid93 commented Sep 17, 2024 •

edited

Loading

BenjaminBossan commented Sep 17, 2024

saeid93 commented Sep 17, 2024

saeid93 commented Oct 5, 2024 •

edited

Loading

BenjaminBossan commented Oct 7, 2024

saeid93 commented Oct 7, 2024

BenjaminBossan commented Oct 8, 2024

saeid93 commented Oct 9, 2024

BenjaminBossan commented Oct 9, 2024

github-actions bot commented Nov 2, 2024

MobileViT does not work with Inference with different LoRA adapters in the same batch #1967

MobileViT does not work with Inference with different LoRA adapters in the same batch #1967

Comments

saeid93 commented Jul 29, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Jul 29, 2024

saeid93 commented Jul 29, 2024

BenjaminBossan commented Jul 30, 2024

github-actions bot commented Sep 11, 2024

saeid93 commented Sep 17, 2024 • edited Loading

BenjaminBossan commented Sep 17, 2024

saeid93 commented Sep 17, 2024

saeid93 commented Oct 5, 2024 • edited Loading

Background

Solution 1

Solution 2

Solution 3

BenjaminBossan commented Oct 7, 2024

saeid93 commented Oct 7, 2024

BenjaminBossan commented Oct 8, 2024

saeid93 commented Oct 9, 2024

BenjaminBossan commented Oct 9, 2024

github-actions bot commented Nov 2, 2024

saeid93 commented Sep 17, 2024 •

edited

Loading

saeid93 commented Oct 5, 2024 •

edited

Loading