Inference with different LoRA adapters in the same batch - Embedding models #2088

DogitoErgoSum · 2024-09-23T11:35:49Z

DogitoErgoSum
Sep 23, 2024

Hello,

I would like to know if the following https://huggingface.co/docs/peft/main/en/developer_guides/lora#inference-with-different-lora-adapters-in-the-same-batch is possible with embedding models, such as XLMRobertaModel, as i was not able to work it out.

Answered by BenjaminBossan

Sep 23, 2024

Thanks for the code. I could not reproduce the issue, for me it worked just fine. I also tried 2 different adapters and it still worked. Here is the self-contained code. Could you check if it passes for you as well?

from transformers import AutoTokenizer, AutoModel
from peft import LoraConfig, PeftModel, get_peft_model

tokenizer = AutoTokenizer.from_pretrained('intfloat/multilingual-e5-large-instruct')
queries = ["hello", "world"]
batch_dict = tokenizer(queries, max_length=512, padding=True, truncation=True, return_tensors='pt')

# first create a dummy LoRA adapter
model = AutoModel.from_pretrained('intfloat/multilingual-e5-large-instruct')
conf = LoraConfig(target_modules=["key", "query",

View full answer

BenjaminBossan · 2024-09-23T12:14:17Z

BenjaminBossan
Sep 23, 2024
Maintainer

I believe it should work. Could you please show the code you're running and what error you get?

8 replies

DogitoErgoSum Sep 23, 2024
Author

This is because i get PeftModelForFeatureExtraction instead of PeftModelForCausalLM

BenjaminBossan Sep 23, 2024
Maintainer

Could you please show the full code, i.e. how you load the model and the adapters and then call inference?

DogitoErgoSum Sep 23, 2024
Author

from peft import PeftModel
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('intfloat/multilingual-e5-large-instruct')
model = AutoModel.from_pretrained('intfloat/multilingual-e5-large-instruct')

peft_id = "models/file1"

peft_model = PeftModel.from_pretrained(model, peft_id, adapter_name="adapter_1")

queries = [
    "hello"
]

batch_dict = tokenizer(queries, max_length=512, padding=True, truncation=True, return_tensors='pt')

model(**batch_dict, adapter_names = ["adapter_1"])

TypeError Traceback (most recent call last)
/tmp/ipykernel_9173/3778279925.py in
5 peft_model = PeftModel.from_pretrained(model, peft_id, adapter_name="adapter_1")
6
----> 7 model(**batch_dict, adapter_names = ["adapter1"])

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() got an unexpected keyword argument 'adapter_names'

BenjaminBossan Sep 23, 2024
Maintainer

Thanks for the code. I could not reproduce the issue, for me it worked just fine. I also tried 2 different adapters and it still worked. Here is the self-contained code. Could you check if it passes for you as well?

from transformers import AutoTokenizer, AutoModel
from peft import LoraConfig, PeftModel, get_peft_model

tokenizer = AutoTokenizer.from_pretrained('intfloat/multilingual-e5-large-instruct')
queries = ["hello", "world"]
batch_dict = tokenizer(queries, max_length=512, padding=True, truncation=True, return_tensors='pt')

# first create a dummy LoRA adapter
model = AutoModel.from_pretrained('intfloat/multilingual-e5-large-instruct')
conf = LoraConfig(target_modules=["key", "query", "value"], init_lora_weights=False)
model = get_peft_model(model, conf)
model.save_pretrained("/tmp/peft/2088")
del model

# load 2 adapters and make mixed batch inference
model = AutoModel.from_pretrained('intfloat/multilingual-e5-large-instruct')
model = PeftModel.from_pretrained(model, "/tmp/peft/2088", adapter_name="adapter_0")
model.load_adapter("/tmp/peft/2088", adapter_name="adapter_1");
model(**batch_dict, adapter_names=["adapter_1", "adapter_0"])
# prints:
# BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.5068,  0.6244,  0.3947,  ..., -0.5503, -1.1555,  0.5420],
# etc.

If this fails for you, could you please report the package versions you use for PEFT and transformers?

Answer selected by DogitoErgoSum

DogitoErgoSum Sep 23, 2024
Author

Seems to work! I had to do some upgrades. Thank you

DogitoErgoSum · 2024-09-24T07:46:01Z

DogitoErgoSum
Sep 24, 2024
Author

@BenjaminBossan Is there a way to make inference to different adapters thread-safe? For example, when receiving multiple requests at the same time to different adapters, without them interfering with each other? I am getting non-deterministic answers when making concurrent requests to different adapters.

1 reply

BenjaminBossan Sep 24, 2024
Maintainer

Sorry, this isn't really something that's being accounted for in PEFT. You'll probably have to lock the inference while one batch is being worked on before allowing the next. I don't think it should really be detrimental to performance, as the task will be compute bound.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference with different LoRA adapters in the same batch - Embedding models #2088

{{title}}

Replies: 2 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Inference with different LoRA adapters in the same batch - Embedding models #2088

DogitoErgoSum Sep 23, 2024

Replies: 2 comments · 9 replies

BenjaminBossan Sep 23, 2024 Maintainer

DogitoErgoSum Sep 23, 2024 Author

BenjaminBossan Sep 23, 2024 Maintainer

DogitoErgoSum Sep 23, 2024 Author

BenjaminBossan Sep 23, 2024 Maintainer

DogitoErgoSum Sep 23, 2024 Author

DogitoErgoSum Sep 24, 2024 Author

BenjaminBossan Sep 24, 2024 Maintainer

DogitoErgoSum
Sep 23, 2024

Replies: 2 comments 9 replies

BenjaminBossan
Sep 23, 2024
Maintainer

DogitoErgoSum Sep 23, 2024
Author

BenjaminBossan Sep 23, 2024
Maintainer

DogitoErgoSum Sep 23, 2024
Author

BenjaminBossan Sep 23, 2024
Maintainer

DogitoErgoSum Sep 23, 2024
Author

DogitoErgoSum
Sep 24, 2024
Author

BenjaminBossan Sep 24, 2024
Maintainer