-
Notifications
You must be signed in to change notification settings - Fork 38
[Model Enabling] llama3-8b-instruct-chat Enabling #225
Conversation
please update support list |
import faulthandler | ||
import functools | ||
import itertools | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about use convert llama, rather than add new script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge to one file when refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add <eot_id>
processing for the end of the message in a turn, see https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ and ggerganov/llama.cpp#6751 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Type of Change
Supported llama3
Description
Expected Behavior & Potential Risk
N/A
How has this PR been tested?
Perf: -m 0 -C 0-55 m4
model.init(model_name, weight_dtype="int4", compute_dtype="int8")
model.generate(inputs, streamer=streamer, max_new_tokens=33, threads=56, ctx_size=1062, do_sample=False)
32 in 32 out
1024 in 32 out
model.init(model_name, weight_dtype="int4", compute_dtype="int8", scale_dtype="bf16")
Inference Screenshots:
FP32:
Q4_J:
Dependency Change?
N/A