[BugFix] Implement RoPE for GPT-J #941

WoosukKwon · 2023-09-04T15:16:12Z

Fixes #747 and fixes #590

This PR fixes a bug in the GPT-J model implementation. The GPT-J model uses a rotary embedding that is slightly different from the GPT-NeoX style rotary embedding (which is commonly used for LLaMA and recent models). The difference isn't considered in the vLLM's current implementation. The PR resolves this by adding a RoPE kernel for GPT-J. After this fix, I've checked that the outputs of GPT-J when using FP32 and argmax sampling match the HF's outputs.

NOTE: The PR should be merged after #938

zhuohan123

LGTM! Left a small comment about an alternative coding style. Feel free to choose the one that you think is better.

zhuohan123 · 2023-09-06T02:34:46Z

vllm/model_executor/layers/attention.py

@@ -253,8 +253,10 @@ def __init__(
        max_position: int = 8192,
        base: int = 10000,
        num_kv_heads: Optional[int] = None,
+        is_neox_style: bool = True,


Nit: Do you think this will be more clear?

Suggested change

is_neox_style: bool = True,

style: str = "neox", # Options: ["neox", "gptj"]

To my knowledge, there are only two types of RoPE in terms of how they rotate the query and key vectors (To my understanding, the rope scaling stuffs use the GPT-NeoX RoPE). I think we can change the interface after we find more RoPEs to support.

WoosukKwon added 13 commits September 2, 2023 06:36

Cleanup tests

0af5cd8

Clean up test_cache.py

f1e07ab

Add kv_cache_factory

ebbf807

Refactor test for single_query_cached_kv

c3c8aa5

Fix test_attention

7a911f3

Clean up test_attention

6c53530

Normalize KV cache

02d6331

Disable ALIBI

cfd6a64

Minor

982af6f

Merge branch 'main' into cleanup-kernel-tests

0f8cf2c

Minor

d651b66

Merge branch 'main' into gptj

63a2fc3

Fix GPT-J RoPE

c76bd73

WoosukKwon requested a review from zhuohan123 September 4, 2023 15:16

yapf

97dd787

This was referenced Sep 4, 2023

Maybe Wrong implementation of AttentionWithRoPE for GPTJ and GPT-NeoX? #747

Closed

GPTJ output not consistent with that of transformers #590

Closed

Bump up the version to v0.1.5 #944

Merged

WoosukKwon added 3 commits September 6, 2023 00:03

Merge branch 'main' into gptj

4786a75

Minor

2885d53

Remove #include <string>

12353cf

zhuohan123 approved these changes Sep 6, 2023

View reviewed changes

WoosukKwon merged commit 320a622 into main Sep 6, 2023

WoosukKwon deleted the gptj branch September 6, 2023 02:54

liuyanyi pushed a commit to liuyanyi/vllm that referenced this pull request Sep 12, 2023

[BugFix] Implement RoPE for GPT-J (vllm-project#941)

26f3c2a

WoosukKwon mentioned this pull request Sep 12, 2023

AttributeError: module 'vllm.pos_encoding_ops' has no attribute 'rotary_embedding'. Did you mean: 'rotary_embedding_neox'? #1021

Closed

wjueyao mentioned this pull request Oct 17, 2023

Starcoder output is noise after upgrading to 0.2.0 #1385

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[BugFix] Implement RoPE for GPT-J (vllm-project#941)

eba9289

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Implement RoPE for GPT-J #941

[BugFix] Implement RoPE for GPT-J #941

WoosukKwon commented Sep 4, 2023

zhuohan123 left a comment

zhuohan123 Sep 6, 2023

WoosukKwon Sep 6, 2023

	is_neox_style: bool = True,
	style: str = "neox", # Options: ["neox", "gptj"]

[BugFix] Implement RoPE for GPT-J #941

[BugFix] Implement RoPE for GPT-J #941

Conversation

WoosukKwon commented Sep 4, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

zhuohan123 Sep 6, 2023

Choose a reason for hiding this comment

WoosukKwon Sep 6, 2023

Choose a reason for hiding this comment