Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement custom kernel for LLaMA rotary embedding #14

Merged
merged 9 commits into from
Mar 30, 2023

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Mar 30, 2023

This PR implements a custom CUDA kernel for rotary embedding, which is used in LLaMA. The kernel is responsible for the entire process of applying rotary embedding to query and key, and is thus much more efficient than the PyTorch implementation.

Tested models:

  • LLaMA-7B
  • LLaMA-13B

Tested GPUs:

  • A100

@WoosukKwon WoosukKwon requested a review from zhuohan123 March 30, 2023 10:29
@WoosukKwon WoosukKwon changed the title Add custom kernel for rotary embedding Implement custom kernel for LLaMA rotary embedding Mar 30, 2023
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

csrc/pos_encoding_kernels.cu Show resolved Hide resolved
@WoosukKwon WoosukKwon merged commit 88c0268 into main Mar 30, 2023
@WoosukKwon WoosukKwon deleted the rotary-embedding branch March 30, 2023 18:04
bigPYJ1151 added a commit to bigPYJ1151/vllm that referenced this pull request Sep 12, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 25, 2024
mzusman pushed a commit to mzusman/vllm that referenced this pull request May 6, 2024
* remove JambaConfig and use official one from transformers

* changes in Jamba modeling file to align with official HF format
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request May 31, 2024
enable fused topK_softmax kernel for hip path
ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
Summary:
  Add benchmarking scripts and utils.
 Things to note : 
   - All files are stored in `neuralmagic` folder.
- neuralmagic/benchmarks/scripts/* : Actual benchmarking scripts that
interact with vllm engine.
- neuralmagic/benchmarks/configs/* : JSON config files that define what
benchmark commands to run.
- neuralmagic/benchmarks/run_*.py : Scripts that consume some config
file and run the benchmark scripts.
   - neuralmagic/tools : Add tools 

Testing:
Local testing

---------

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants