New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Optimize Qwen2 lm_head to use INT4 #12072

Merged

rnwang04 merged 20 commits into intel-analytics:main from rnwang04:lm_head_experiment

Sep 14, 2024

Contributor

rnwang04 commented Sep 12, 2024 •

edited

Loading

Description

Optimize Qwen 2 7B lm_head to use INT4 for sym_int4 on NPU

https://github.com/analytics-zoo/nano/issues/1633#issuecomment-2348451555

verification

optimized Qwen2-7B
optimized Qwen2-7B with IPEX_LLM_CPU_LM_HEAD=1
non-optimized Qwen2-7B
optimized Qwen2-7B with save / load

rnwang04 added 2 commits

September 12, 2024 13:54


temp save

3e6a646


update

869d270

rnwang04 marked this pull request as draft

September 12, 2024 10:12

rnwang04 and others added 3 commits

September 12, 2024 14:06

fix

1efb502

fix

9b4df02


Split lm_head into 7 parts & remove int8 for lm_head when sym_int4

3e4b2f0

Oscilloscope98 changed the title ~~[Test] lm_head experiment~~ [NPU] Optimize Qwen2 lm_head to use INT4

Oscilloscope98 added 2 commits

September 13, 2024 18:19


Simlify and add condition to code

f35f13e


Small fix

rnwang04 marked this pull request as ready for review

September 14, 2024 02:22

rnwang04 requested a review from jason-dai

September 14, 2024 02:22

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/convert_mp.py Outdated Show resolved Hide resolved

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/linear.py Outdated Show resolved Hide resolved

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/convert_mp.py Show resolved Hide resolved

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/linear.py Outdated Show resolved Hide resolved

rnwang04 requested a review from jason-dai

September 14, 2024 04:25

Contributor Author

rnwang04 commented Sep 14, 2024 •

edited

Loading

@jason-dai Have refactored the whole code structure based on your comments and now this optimization can be easily used for other models. Would you minding take a look again ? 😊

rnwang04 force-pushed the lm_head_experiment branch from e8b4cb8 to 0efa199 Compare

September 14, 2024 06:01

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/lm_head.py Outdated Show resolved Hide resolved

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/convert_mp.py Outdated Show resolved Hide resolved

jason-dai reviewed

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/lm_head.py Outdated Show resolved Hide resolved

jason-dai approved these changes

View reviewed changes

Contributor

jason-dai left a comment

LGTM

rnwang04 merged commit 081af41 into intel-analytics:main

1 check passed

rnwang04 added 7 commits

September 14, 2024 10:20


refactor some code

e414efa


fix style

a049f40


fix style

7819ae3


fix style

db7fa40

fix

2b2c8b1

fix

e21a9cf


temp sav e

9b206af

rnwang04 added 6 commits

September 14, 2024 12:22


refactor

add9edb


fix style

526417b


further refactor

d9278e9


simplify code

0efa199


meet code review

42cc11f


fix style

604649c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment