Add npu support to big model inference #2222

statelesshz · 2023-12-06T11:58:02Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2023-12-06T12:21:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

junior-zsy · 2023-12-07T03:36:57Z

@statelesshz @muellerzr Branch code error

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 678, in get_max_memory
max_memory = {i: torch.npu.mem_get_ifo(i)[0] for i in range(torch.npu.device_count())}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 678, in
max_memory = {i: torch.npu.mem_get_ifo(i)[0] for i in range(torch.npu.device_count())}
AttributeError: module 'torch_npu.npu' has no attribute 'mem_get_ifo'

junior-zsy · 2023-12-07T03:37:27Z

@statelesshz @muellerzr
i change torch.npu.mem_get_ifo to. torch.npu.max_memory_allocated,it works ,But there are other errors

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.06s/it]
Traceback (most recent call last):
File "server_fb.py", line 21, in
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1038, in chat
outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/generation/utils.py", line 2619, in sample
outputs = self(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 940, in forward
transformer_outputs = self.transformer(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 833, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 643, in forward
layer_ret = layer(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 547, in forward
attention_output, kv_cache = self.self_attention(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 379, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu" not implemented for 'Half'

statelesshz · 2023-12-07T05:00:44Z

Hi @junior-zsy. Thanks for your feedback. Sorry for that this PR needs more testing before it's ready for review.
FYI, the torch_npu v2.1.0 branch recently added torch.npu.mem_get_info, so maybe you need to use torch_npu compiled from source code.

junior-zsy · 2023-12-07T07:53:52Z

Hi @junior-zsy. Thanks for your feedback. Sorry for that this PR needs more testing before it's ready for review. FYI, the torch_npu v2.1.0 branch recently added torch.npu.mem_get_info, so maybe you need to use torch_npu compiled from source code.

{0: 64145637376, 1: 64153034752}
Loading checkpoint shards: 0%| | 0/7 [00:01<?, ?it/s]
Traceback (most recent call last):
File "server_fb.py", line 19, in
model = AutoModelForCausalLM.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", device_map="auto",trust_remote_code=True)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
) = cls._load_pretrained_model(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3228, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/modeling_utils.py", line 720, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 315, in set_module_tensor_to_device
new_value = value.to(device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/cuda/init.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

junior-zsy · 2023-12-07T08:22:26Z

@statelesshz

replace .to() with .to("npu:") when using torch_npu,new error ,The model can be loaded now, but it cannot be forward

File "server_fb.py", line 23, in
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1038, in chat
outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/generation/utils.py", line 2619, in sample
outputs = self(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/hooks.py", line 160, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/hooks.py", line 297, in pre_forward
return send_to_device(args, self.execution_device), send_to_device(
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/operations.py", line 161, in send_to_device
{
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/operations.py", line 162, in
k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/accelerate/utils/operations.py", line 168, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/cuda/init.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

junior-zsy · 2023-12-07T08:33:32Z

@statelesshz the same problem,Need to modify int -> "npu:int",I have modified some of the code and it can now run
Multiple graphics cards can work ,but I have discovered other issues, which is the inability to use multiple cards and threads

code:

import time
import threading

import torch
import torch_npu

def chat_in_thread(tokenizer, model, i):
start_time = time.time()
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
end_time = time.time()
print(f"Thread {i}: Response - {response}")
print(f"Thread {i}: Execution time - {end_time - start_time} seconds")

加载模型前的开始时间

start_time = time.time()

tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", device_map="auto",trust_remote_code=True)

print(model)
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
print(response)

计算加载模型的时间

model_load_time = time.time() - start_time

创建多个线程执行聊天功能

threads = []
num_threads = 4

for i in range(num_threads):
thread = threading.Thread(target=chat_in_thread, args=(tokenizer, model, i))
threads.append(thread)

启动线程

for thread in threads:
thread.start()

等待所有线程完成

for thread in threads:
thread.join()

print("Model loading time:", model_load_time, "seconds")

error:
我是基于清华大学 KEG 实验室和智谱 AI 公司于 2022 年共同训练的语言模型 GLM3-6B 开发的。我的任务是针对用户的问题和要求提供适当的答复和支持。
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
Exception in thread Thread-3:
Traceback (most recent call last):
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-4:
Traceback (most recent call last):
return func(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

response, history = model.chat(tokenizer, "你是谁开发的", history=[])

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

self.data = {k: v.to(device=device) for k, v in self.data.items()}

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Model loading time: 18.54421830177307 seconds

statelesshz · 2023-12-07T08:47:40Z

Hi @junior-zsy Let’s focus on the work in progress with this PR :-) If you find some unexpected behavior when using torch_npu, feel free to open a issue

junior-zsy · 2023-12-07T09:32:33Z

@statelesshz Okay, you're ignoring the issue of multithreading

statelesshz · 2023-12-08T01:02:26Z

verified on Baichuan2-13B-Base with the following results:

(inference) [root@node-35 inference]# cat test.py
import torch
import torch_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("/home/gpt_neox/weights_second/Baichuan2-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/home/gpt_neox/weights_second/Baichuan2-13B-Base", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('npu:0')
pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
(inference) [root@node-35 inference]# vim test.py
(inference) [root@node-35 inference]# python test.py
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:58<00:00, 19.38s/it]
登鹳雀楼->王之涣
夜雨寄北->李商隐
望天门山->李白
饮湖上初晴后雨->苏轼
惠崇春江晚景->苏轼
题西林壁->苏轼
夏日绝句->李清照
示儿->陆游
秋夜将晓出篱门迎凉有感->陆游

statelesshz · 2023-12-08T08:14:07Z

Hi @SunMarc, this PR is ready for review :-)

SunMarc

Thanks for this clean integration @statelesshz ! can you have a second look @muellerzr ?

muellerzr

Thanks for all your work on this! Great job!

statelesshz marked this pull request as draft December 6, 2023 11:58

muellerzr requested a review from SunMarc December 6, 2023 12:18

statelesshz force-pushed the big-model-inference branch 3 times, most recently from 61a105f to f6d5704 Compare December 7, 2023 10:43

statelesshz added 6 commits December 8, 2023 16:08

Add npu support to big model inference

7621202

make style

90c8bf7

add warning when using npu

5c7d2ac

fix typo

f39d019

replace .to(<num>) with .to("npu:<num>") when using torch_npu`

430c529

empty_cache

48b6482

statelesshz force-pushed the big-model-inference branch from ff45785 to 48b6482 Compare December 8, 2023 08:09

fix

b726f27

statelesshz marked this pull request as ready for review December 8, 2023 08:12

SunMarc approved these changes Dec 8, 2023

View reviewed changes

SunMarc requested a review from muellerzr December 8, 2023 16:18

muellerzr approved these changes Dec 8, 2023

View reviewed changes

SunMarc merged commit 9964f90 into huggingface:main Dec 8, 2023
23 checks passed

statelesshz deleted the big-model-inference branch December 9, 2023 03:10

statelesshz mentioned this pull request Dec 9, 2023

NPU for get_max_memory #2186

Closed

lichangW mentioned this pull request Jan 22, 2024

Huawei NPU device_map=auto doesn't split model evenly over all devices #2368

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add npu support to big model inference #2222

Add npu support to big model inference #2222

statelesshz commented Dec 6, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 6, 2023

junior-zsy commented Dec 7, 2023 •

edited

Loading

junior-zsy commented Dec 7, 2023 •

edited

Loading

statelesshz commented Dec 7, 2023 •

edited

Loading

junior-zsy commented Dec 7, 2023

junior-zsy commented Dec 7, 2023

junior-zsy commented Dec 7, 2023 •

edited

Loading

statelesshz commented Dec 7, 2023

junior-zsy commented Dec 7, 2023

statelesshz commented Dec 8, 2023

statelesshz commented Dec 8, 2023

SunMarc left a comment

muellerzr left a comment

Add npu support to big model inference #2222

Add npu support to big model inference #2222

Conversation

statelesshz commented Dec 6, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 6, 2023

junior-zsy commented Dec 7, 2023 • edited Loading

junior-zsy commented Dec 7, 2023 • edited Loading

statelesshz commented Dec 7, 2023 • edited Loading

junior-zsy commented Dec 7, 2023

junior-zsy commented Dec 7, 2023

junior-zsy commented Dec 7, 2023 • edited Loading

加载模型前的开始时间

计算加载模型的时间

创建多个线程执行聊天功能

启动线程

等待所有线程完成

statelesshz commented Dec 7, 2023

junior-zsy commented Dec 7, 2023

statelesshz commented Dec 8, 2023

statelesshz commented Dec 8, 2023

SunMarc left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

statelesshz commented Dec 6, 2023 •

edited

Loading

junior-zsy commented Dec 7, 2023 •

edited

Loading

junior-zsy commented Dec 7, 2023 •

edited

Loading

statelesshz commented Dec 7, 2023 •

edited

Loading

junior-zsy commented Dec 7, 2023 •

edited

Loading