Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Preliminary implementation on Qualcomm NPU (QNN) backend. #112

Merged
merged 518 commits into from
Aug 9, 2024

Conversation

liang1232018
Copy link
Collaborator

A preliminary support for Qualcomm NPU prefilling

  • add a new qnn backend
  • implement llm-specific ops.

Design details referring to this paper: Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU.

yirongjie and others added 30 commits March 30, 2024 02:41
feat: multi graph backends execute
…ut in QNN will be add 128 bugs by sub 128 in LLaMA add during residual. 3. Set all qnn graph input tenosr as SINT8.
@yirongjie yirongjie self-requested a review August 9, 2024 09:58
@yirongjie yirongjie merged commit 83d65df into main Aug 9, 2024
1 check passed
@yirongjie yirongjie deleted the develop branch August 9, 2024 15:00
@yirongjie yirongjie changed the title Preliminary implementation on Qualcomm NPU (QNN) backend. feat: Preliminary implementation on Qualcomm NPU (QNN) backend. Aug 16, 2024
@yirongjie yirongjie added the good first issue Good for newcomers label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants