Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [REQUEST] - 是否可以实现多个lora并行加载?(或快速切换) #822

Closed
zhaodice opened this issue Dec 19, 2023 · 5 comments
Labels
question Further information is requested

Comments

@zhaodice
Copy link

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

相关示例中只有一个模型加载一个lora的实例,假设我有多个专家lora模型,应该如何进行快速切换呢,每次重新加载模型会引入延迟,是否可以实现单纯切换lora层来实现这个效果?
因为我发现使用lora后会改变原始模型在内存中的结构,导致无法再应用其他lora,产生冲突。

基本示例 | Basic Example

参考知乎的 https://zhuanlan.zhihu.com/p/666076100
当我对qwen-14b-int4 使用 load_adapter 的时候报错
AttributeError: 'QuantLinear' object has no attribute 'qweight'

perft_model = PeftModel.from_pretrained(model, "/home/user/git/qwen/Qwen/output_qwen/character2.0",adapter_name = "character2.0")

#这一步在通义千问出错,因为上一步PeftModel from_pretrained 改变了 model 导致出问题。
perft_model.load_adapter("/home/user/git/qwen/Qwen/output_qwen/character",adapter_name = "character")

#可以直接切换lora,比重新加载模型更高效
#model.set_adapter("character")
#model.set_adapter("character2.0")

缺陷 | Drawbacks

暂时没有发现缺陷,非要鸡蛋里挑骨头就是每次换lora都要 model.set_adapter("lora名字") 从内存中切换

未解决问题 | Unresolved questions

No response

@zhaodice zhaodice added the question Further information is requested label Dec 19, 2023
@zhaodice
Copy link
Author

当然,可以多次加载qwen-14b-int4模型,每次加载应用一个lora,但显存代价过于恐怖,等于是多次重复加载同一个基础模型。
也可以每次想切换lora时就卸载qwen-14b-int4模型再重新加载,但时间代价也比较难以接受。
最理想的情况当然是 set_adapter 直接切换

@zhaodice
Copy link
Author

zhaodice commented Dec 19, 2023

——UPDATE——

这不是qwen的问题,这是peft库的BUG,详见:
huggingface/peft#1243
huggingface/peft#1239

修复方式:pip install -U peft

@quanshr
Copy link

quanshr commented Jan 9, 2024

我在尝试使用load_adapter时发现会将lora层加入model中,在set_adapter并没有改变结构,实际上是多个lora层共同的结果,应该如何解决呢?

base_model.load_adapter('./lora_model1', '1')
base_model.load_adapter('./lora_model2', '2')
print_size(base_model)
base_model.set_adapter('1')
print_size(base_model)
base_model.set_adapter('2')
print_size(base_model)

发现几次模型大小都一样,推理时输出也一样

@zhaodice
Copy link
Author

我在尝试使用load_adapter时发现会将lora层加入model中,在set_adapter并没有改变结构,实际上是多个lora层共同的结果,应该如何解决呢?

base_model.load_adapter('./lora_model1', '1')
base_model.load_adapter('./lora_model2', '2')
print_size(base_model)
base_model.set_adapter('1')
print_size(base_model)
base_model.set_adapter('2')
print_size(base_model)

发现几次模型大小都一样,推理时输出也一样

我这边set是有用的,建议自己再检查一下微调结果

@SCHfighting
Copy link

怎么设置多个lora共同起作用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants
@zhaodice @quanshr @SCHfighting and others