Inference with different LoRA adapters in the same batch - Embedding models #2088
-
Hello, I would like to know if the following https://huggingface.co/docs/peft/main/en/developer_guides/lora#inference-with-different-lora-adapters-in-the-same-batch is possible with embedding models, such as XLMRobertaModel, as i was not able to work it out. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
I believe it should work. Could you please show the code you're running and what error you get? |
Beta Was this translation helpful? Give feedback.
-
@BenjaminBossan Is there a way to make inference to different adapters thread-safe? For example, when receiving multiple requests at the same time to different adapters, without them interfering with each other? I am getting non-deterministic answers when making concurrent requests to different adapters. |
Beta Was this translation helpful? Give feedback.
Thanks for the code. I could not reproduce the issue, for me it worked just fine. I also tried 2 different adapters and it still worked. Here is the self-contained code. Could you check if it passes for you as well?