Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support load qwen2-72b-instruct lora #5498

Closed

Conversation

NiuBlibing
Copy link
Contributor

@NiuBlibing NiuBlibing commented Jun 13, 2024

Like #4007, to support qwen2-72b-instruct's lora adapter with 1,2,4,8 tp-size.

Ref #3793

@NiuBlibing NiuBlibing changed the title Add 3696 bgmv-kernel to support qwen2-72b-instruct lora Add 3696 bgmv-kernel to support qwen2-72b-instruct lora with tp 8 Jun 13, 2024
@NiuBlibing NiuBlibing changed the title Add 3696 bgmv-kernel to support qwen2-72b-instruct lora with tp 8 Add 3696 bgmv-kernel to support qwen2-72b-instruct lora Jun 13, 2024
@NiuBlibing NiuBlibing changed the title Add 3696 bgmv-kernel to support qwen2-72b-instruct lora support load qwen2-72b-instruct lora Jun 13, 2024
@NiuBlibing NiuBlibing marked this pull request as draft June 13, 2024 10:33
@NiuBlibing NiuBlibing closed this Jun 13, 2024
@NiuBlibing NiuBlibing reopened this Jun 13, 2024
@NiuBlibing NiuBlibing closed this Jun 13, 2024
@NiuBlibing NiuBlibing reopened this Jun 14, 2024
@NiuBlibing NiuBlibing closed this Jun 14, 2024
@NiuBlibing NiuBlibing reopened this Jun 14, 2024
@NiuBlibing NiuBlibing closed this Jun 14, 2024
@NiuBlibing
Copy link
Contributor Author

Currntly punica kernel cannot support Qwen2-72B-Instruct because of 3696 could not be divided by 64. Hope #5036 or #5356 will work.

@jeejeelee
Copy link
Collaborator

jeejeelee commented Jun 14, 2024

Could you provide your running script?

I can test Qwen2-72B-Instruct+LoRA on my local device using #5036.

@NiuBlibing
Copy link
Contributor Author

Could you provide your running script?

I can test Qwen2-72B-Instruct+LoRA on my local device using #5356.

I just start it with vllm cli.

python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-72B-Chat-test --model ./Qwen/Qwen2-72B-Instruct/ --gpu-memory-utilization 0.9 --tensor-parallel-size 8 --enable-lora --lora-dtype bfloat16 --lora-modules test=/path/to/lora/

@jeejeelee
Copy link
Collaborator

Could you provide your running script?
I can test Qwen2-72B-Instruct+LoRA on my local device using #5356.

I just start it with vllm cli.

python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-72B-Chat-test --model ./Qwen/Qwen2-72B-Instruct/ --gpu-memory-utilization 0.9 --tensor-parallel-size 8 --enable-lora --lora-dtype bfloat16 --lora-modules test=/path/to/lora/

Sorry, Actually, #5036 was used for the testing.

I have completed the test, #5036 can resolve this issue.

However, there are still some other issues that need to be resolved with #5036, I will process ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants