-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bad Case]: function calling可以不用vllm吗? #203
Labels
badcase
Bad cases
Comments
比如上面的代码。怎么使用function calling?不要用vllm. |
可以参考示例代码,先把你的messages format成某个格式的prompt,然后用模型推理就可以。用transformer或vllm都可以。 |
能给出具体代码吗?刚入坑的小白 |
可以使用tokenizer.apply_chat_template来做模板处理,供参考。
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description / 描述
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import time
s = time.time()
path = "D:/MiniCPM3-4B"
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
messages2 = [
{"role": "user", "content": "推荐5个北京的景点。"},
]
messages = [
{"role": "user", "content": "汽车保险有哪些"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
model_outputs = model.generate(
model_inputs,
max_new_tokens=1024,
top_p=0.7,
temperature=0.7
)
output_token_ids = [
model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)
e = time.time()
print(e-s)
Case Explaination / 案例解释
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import time
s = time.time()
path = "D:/MiniCPM3-4B"
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
messages2 = [
{"role": "user", "content": "推荐5个北京的景点。"},
]
messages = [
{"role": "user", "content": "汽车保险有哪些"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
model_outputs = model.generate(
model_inputs,
max_new_tokens=1024,
top_p=0.7,
temperature=0.7
)
output_token_ids = [
model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)
e = time.time()
print(e-s)
The text was updated successfully, but these errors were encountered: