-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💡 [REQUEST] - 是否可以实现多个lora并行加载?(或快速切换) #822
Comments
当然,可以多次加载qwen-14b-int4模型,每次加载应用一个lora,但显存代价过于恐怖,等于是多次重复加载同一个基础模型。 |
——UPDATE—— 这不是qwen的问题,这是peft库的BUG,详见: 修复方式:pip install -U peft |
我在尝试使用load_adapter时发现会将lora层加入model中,在set_adapter并没有改变结构,实际上是多个lora层共同的结果,应该如何解决呢? base_model.load_adapter('./lora_model1', '1')
base_model.load_adapter('./lora_model2', '2')
print_size(base_model)
base_model.set_adapter('1')
print_size(base_model)
base_model.set_adapter('2')
print_size(base_model) 发现几次模型大小都一样,推理时输出也一样 |
我这边set是有用的,建议自己再检查一下微调结果 |
怎么设置多个lora共同起作用 |
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
相关示例中只有一个模型加载一个lora的实例,假设我有多个专家lora模型,应该如何进行快速切换呢,每次重新加载模型会引入延迟,是否可以实现单纯切换lora层来实现这个效果?
因为我发现使用lora后会改变原始模型在内存中的结构,导致无法再应用其他lora,产生冲突。
基本示例 | Basic Example
参考知乎的 https://zhuanlan.zhihu.com/p/666076100
当我对qwen-14b-int4 使用 load_adapter 的时候报错
AttributeError: 'QuantLinear' object has no attribute 'qweight'
缺陷 | Drawbacks
暂时没有发现缺陷,非要鸡蛋里挑骨头就是每次换lora都要 model.set_adapter("lora名字") 从内存中切换
未解决问题 | Unresolved questions
No response
The text was updated successfully, but these errors were encountered: