[Model Enabling] llama3-8b-instruct-chat Enabling #225

Zhenzhong1 · 2024-04-18T07:07:01Z

Type of Change

Supported llama3

Description

Validated models: llama3_8b_instruct-chat
Supported MHA to accerlate the llama3_8b_instruct-chat
Supported FFN to accerlate the llama3_8b_instruct-chat
Q4_J inference pass

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

Perf: -m 0 -C 0-55 m4
model.init(model_name, weight_dtype="int4", compute_dtype="int8")
model.generate(inputs, streamer=streamer, max_new_tokens=33, threads=56, ctx_size=1062, do_sample=False)
32 in 32 out

1024 in 32 out

model.init(model_name, weight_dtype="int4", compute_dtype="int8", scale_dtype="bf16")

Inference Screenshots:
FP32:

Q4_J:

Dependency Change?

N/A

for more information, see https://pre-commit.ci

kevinintel · 2024-04-18T14:36:30Z

please update support list

neural_speed/models/llama/llama.cpp

a32543254 · 2024-04-19T01:21:29Z

neural_speed/convert/convert_llama3.py

+import faulthandler
+import functools
+import itertools
+import json


how about use convert llama, rather than add new script.

will merge to one file when refactoring.

zhentaoyu

add <eot_id> processing for the end of the message in a turn, see https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ and ggerganov/llama.cpp#6751 (comment).

a32543254

LGTM

Zhenzhong1 and others added 4 commits April 17, 2024 00:07

update printf

c32ade2

inference pass

5f54295

add convert

3e80949

[pre-commit.ci] auto fixes from pre-commit.com hooks

15a54ce

for more information, see https://pre-commit.ci

Zhenzhong1 requested review from a32543254 and kevinintel April 18, 2024 07:09

Zhenzhong1 changed the title ~~[Model Enabling] llama3 Enabling~~ [Model Enabling] llama3_8b_instruct-chat Enabling Apr 18, 2024

Zhenzhong1 changed the title ~~[Model Enabling] llama3_8b_instruct-chat Enabling~~ [Model Enabling] llama3-8b-instruct-chat Enabling Apr 18, 2024

Zhenzhong1 added 2 commits April 18, 2024 01:30

yapf

f27d083

yapf

ceac7ef

kevinintel approved these changes Apr 18, 2024

View reviewed changes

a32543254 reviewed Apr 19, 2024

View reviewed changes

update doc

ad5b81f

zhentaoyu approved these changes Apr 19, 2024

View reviewed changes

add eot_id

2106ee2

a32543254 approved these changes Apr 19, 2024

View reviewed changes

Zhenzhong1 added 2 commits April 18, 2024 23:46

add bos_token_id check

27f05e7

update bos_token_id check

2441564

VincyZhang merged commit fb7d16d into main Apr 19, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model Enabling] llama3-8b-instruct-chat Enabling #225

[Model Enabling] llama3-8b-instruct-chat Enabling #225

Zhenzhong1 commented Apr 18, 2024 •

edited

Loading

kevinintel commented Apr 18, 2024

a32543254 Apr 19, 2024

Zhenzhong1 Apr 19, 2024

zhentaoyu left a comment

a32543254 left a comment

[Model Enabling] llama3-8b-instruct-chat Enabling #225

[Model Enabling] llama3-8b-instruct-chat Enabling #225

Conversation

Zhenzhong1 commented Apr 18, 2024 • edited Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

kevinintel commented Apr 18, 2024

a32543254 Apr 19, 2024

Choose a reason for hiding this comment

Zhenzhong1 Apr 19, 2024

Choose a reason for hiding this comment

zhentaoyu left a comment

Choose a reason for hiding this comment

a32543254 left a comment

Choose a reason for hiding this comment

Zhenzhong1 commented Apr 18, 2024 •

edited

Loading