-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add npu support to big model inference #2222
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@statelesshz @muellerzr Branch code error File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 678, in get_max_memory |
@statelesshz @muellerzr Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.06s/it] |
Hi @junior-zsy. Thanks for your feedback. Sorry for that this PR needs more testing before it's ready for review. |
{0: 64145637376, 1: 64153034752} |
replace .to() with .to("npu:") when using torch_npu,new error ,The model can be loaded now, but it cannot be forward File "server_fb.py", line 23, in |
@statelesshz the same problem,Need to modify int -> "npu:int",I have modified some of the code and it can now run code: import time import torch def chat_in_thread(tokenizer, model, i): 加载模型前的开始时间start_time = time.time() tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True) print(model) 计算加载模型的时间model_load_time = time.time() - start_time 创建多个线程执行聊天功能threads = [] for i in range(num_threads): 启动线程for thread in threads: 等待所有线程完成for thread in threads: print("Model loading time:", model_load_time, "seconds") error: Exception in thread Thread-5:
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in Model loading time: 18.54421830177307 seconds |
Hi @junior-zsy Let’s focus on the work in progress with this PR :-) If you find some unexpected behavior when using |
@statelesshz Okay, you're ignoring the issue of multithreading |
61a105f
to
f6d5704
Compare
verified on
|
ff45785
to
48b6482
Compare
Hi @SunMarc, this PR is ready for review :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this clean integration @statelesshz ! can you have a second look @muellerzr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this! Great job!
What does this PR do?
Fixes #2191
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.