feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

liang1232018 · 2024-08-09T09:49:07Z

A preliminary support for Qualcomm NPU prefilling

add a new qnn backend
implement llm-specific ops.

Design details referring to this paper: Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU.

feat: multi graph backends execute

Develop qnn zh

Develop qnn cq

…ut in QNN will be add 128 bugs by sub 128 in LLaMA add during residual. 3. Set all qnn graph input tenosr as SINT8.

Develop qnn merge

…evelop-QNN

yirongjie and others added 30 commits March 30, 2024 02:41

feat: add QNN OPT test demo

79f6342

feat

f1f8401

fix: rm ls-n

153c026

fix init Backends

ac89f74

fix: cpu matmul int8 scale

c3159da

feat: multi graph backends execute

refactor: cpu split input execute

3cc4f66

Merge pull request #21 from liang1232018/develop-QNN-zh

c866437

Develop qnn zh

fix : gelu, relu and linear scale bugs.

c5e51aa

fix : verified no bias model.

c26727d

fix : verified no layernorm model.

8e7512e

fix : round quant scale.

f2a45bf

fix : CPU QNN matmul int8 bugs.

2b6ba59

feat : support SINT8 in merge output.

2fee710

fix : update models.

1770912

fix : update merge output and split input to support residual.

b0e2803

fix : merge output parameter bugs.

b080672

feat : suport new Mergeoutput and SplitInput.

60ddcbd

fix : split input support number params.

0611e31

feat : support residual and verify correctness.

175f229

Merge pull request #22 from liang1232018/develop-QNN-CQ

7577a2b

Develop qnn cq

refactor: add new executor for qnn pipeline

ff4b49d

feat: add thread pool in qnn pipeline executor

e4cd0c2

support all opt models.

6ed9c08

fix : quantize no clip bug.

018e8bc

fix : CPU output tensor type bugs.

a517e66

fix : CPU view output tensor type bugs.

a587a37

1. support residual models with split and merge. 2. fix FP tensor inp…

54b5ae8

…ut in QNN will be add 128 bugs by sub 128 in LLaMA add during residual. 3. Set all qnn graph input tenosr as SINT8.

fix : 4 layer 128 input opt model.

27f7c85

fix: qnn chunk execute temp result

427b213

Merge branch 'develop-QNN' into develop-QNN-zh

c407c82

oreomaker and others added 5 commits August 9, 2024 11:28

chore: update readme

5dd151c

Update README.md

dddd790

Update README.md

de761f0

Update README.md

f48fa6b

Merge remote-tracking branch 'mllm/develop' into develop-QNN

cb81e2e

yirongjie self-requested a review August 9, 2024 09:58

liang1232018 and others added 8 commits August 9, 2024 18:58

fix : build error and uesless files.

61309e1

fix : paramloader bugs.

141f661

Merge branch 'develop-QNN' into develop-qnn-merge

dda54ee

Merge pull request #38 from liang1232018/develop-qnn-merge

d271ccb

Develop qnn merge

fix : matmul and KVCache bugs.

1da9747

fix:reset src/backends/cpu/CPUTensorFunction.hpp

425eeee

fix: off debug

946857e

fix: CMakelist

47e2ce1

yirongjie approved these changes Aug 9, 2024

View reviewed changes

liang1232018 and others added 11 commits August 9, 2024 21:30

fix : ignore qnn op package makefile error.

343dc7c

fix : ignore qnn and hexagon sdks.

02e093b

fix : ignore qnn and hexagon sdks.

556b896

Update README.md

7348e24

fix : ignore qnn and hexagon sdks.

0cde4f1

fix L

993bd94

Merge remote-tracking branch 'refs/remotes/origin/develop-QNN' into d…

4c05520

…evelop-QNN

fix: qwen npu vocab path

75f3a4f

fix: run qwen demo npu

e524c3d

fix: token

5572e41

Merge remote-tracking branch 'refs/remotes/origin/develop-QNN' into d…

e1f01be

…evelop-QNN

yirongjie merged commit 83d65df into main Aug 9, 2024
1 check passed

yirongjie deleted the develop branch August 9, 2024 15:00

yirongjie changed the title ~~Preliminary implementation on Qualcomm NPU (QNN) backend.~~ feat: Preliminary implementation on Qualcomm NPU (QNN) backend. Aug 16, 2024

yirongjie added the good first issue Good for newcomers label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

liang1232018 commented Aug 9, 2024

feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

Conversation

liang1232018 commented Aug 9, 2024

A preliminary support for Qualcomm NPU prefilling