Add support for LLaMA-2 #505

zhuohan123 · 2023-07-18T22:04:25Z

Fix #501

~~Update: this PR has some correctness issues on 70B models. Will look into it.~~

@WoosukKwon This PR is ready to go. Please review and let's merge it!

gesanqiu · 2023-07-19T03:42:46Z

I notice that you didn't new a llama2.py, so this PR compatible with both llama and llama2?

zhuohan123 · 2023-07-19T04:26:26Z

I notice that you didn't new a llama2.py, so this PR compatible with both llama and llama2?

Yes, old model should still be compatible.

WoosukKwon

@zhuohan123 Awesome! Thanks for the great work! Just left minor comments.

Please double-check that this PR doesn't break LLaMA V1 and other models using RoPE, before merging the PR.

csrc/pos_encoding_kernels.cu

vllm/model_executor/models/llama.py

Add support for LLaMA-2 (vllm-project#505)

ri938 · 2023-07-27T09:24:59Z

I am getting an error when trying to load some LLama V1 models:

LlamaConfig object has no attribute 'num_key_value_heads'

HarrisonBT · 2023-07-28T03:27:33Z

WARNING 07-28 03:23:18 scheduler.py:196] Input prompt (2716 tokens) is too long and exceeds limit of 4096

tuyaao · 2023-07-31T03:41:01Z

I am getting an error when trying to load some LLama V1 models:

LlamaConfig object has no attribute 'num_key_value_heads'
same error from me on lastest master commit: 953f28c

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

zhuohan123 added 3 commits July 18, 2023 22:04

Add support for LLaMA-2

c176d28

fix

4bd1788

Change docs

ca7647d

zhuohan123 changed the title ~~[WIP] Add support for LLaMA-2~~ Add support for LLaMA-2 Jul 18, 2023

zhuohan123 added 4 commits July 18, 2023 22:04

Fix num_kv_heads

dbce8e6

Fix config

24be44b

Fix pos encoding kernel

cdffdad

fix format

eaff1a9

LiuXiaoxuanPKU mentioned this pull request Jul 20, 2023

LlaMA 2: Input prompt (2664 tokens) is too long and exceeds limit of 2048/2560 #525

Closed

zhuohan123 requested a review from WoosukKwon July 20, 2023 05:54

WoosukKwon approved these changes Jul 20, 2023

View reviewed changes

csrc/pos_encoding_kernels.cu Show resolved Hide resolved

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved

zhuohan123 added 2 commits July 20, 2023 18:35

Fix format

b4f1c3a

Add news

674fcea

zhuohan123 merged commit 6fc2a38 into main Jul 20, 2023

gqjia added a commit to gqjia/vllm that referenced this pull request Jul 21, 2023

Merge pull request #1 from vllm-project/main

c33a2f4

Add support for LLaMA-2 (vllm-project#505)

void-main mentioned this pull request Jul 21, 2023

LLaMA support NVIDIA/FasterTransformer#506

Open

zhuohan123 deleted the support-llama-2 branch July 25, 2023 21:59

pseudotensor mentioned this pull request Aug 28, 2023

long context h2oai/h2ogpt#360

Open

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add support for LLaMA-2 (vllm-project#505)

9c84946

pi314ever pushed a commit to pi314ever/vllm that referenced this pull request Nov 20, 2024

Terminate ray workers on ray_hpu_executor shutdown (vllm-project#505)

96467d8

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for LLaMA-2 #505

Add support for LLaMA-2 #505

zhuohan123 commented Jul 18, 2023 •

edited

Loading

gesanqiu commented Jul 19, 2023

zhuohan123 commented Jul 19, 2023

WoosukKwon left a comment

ri938 commented Jul 27, 2023

HarrisonBT commented Jul 28, 2023

tuyaao commented Jul 31, 2023

Add support for LLaMA-2 #505

Add support for LLaMA-2 #505

Conversation

zhuohan123 commented Jul 18, 2023 • edited Loading

gesanqiu commented Jul 19, 2023

zhuohan123 commented Jul 19, 2023

WoosukKwon left a comment

Choose a reason for hiding this comment

ri938 commented Jul 27, 2023

HarrisonBT commented Jul 28, 2023

tuyaao commented Jul 31, 2023

zhuohan123 commented Jul 18, 2023 •

edited

Loading