Add miscellaneous updates #8

WoosukKwon · 2023-03-13T20:48:11Z

This PR contains several miscellaneous updates to the system, with two notable changes:

The size of the CPU KV cache is now calculated based on the swap_space size provided by the user (defaulting to 20 GiB).
The default value for max_num_batched_tokens has been increased from 2048 to 2560.

Organise

* Return support for other models apart from jamba * Support n>1 * A little cleanup * Rename * Apply whitespace suggestions from code review * Add max batch size to the main func * Fixed attention kv cache bug * log where requests id are deleted from the dict to debug mode * Fix typo * Align with v0.3.3 vllm code * Remove comments * Take out model config from CUDAGraph object * Fix * Fix typo * Make the kv cache selection cleaner * Another typo * Took the num layers calc outside * Remove the -1 * Set as num layer / period --------- Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

remove dummy path in arctic

…128k Support Phi3SuScaledRotaryEmbedding for 128k model

update overhead benchmark

WoosukKwon added 6 commits March 13, 2023 18:44

Handle empty inputs

e58e731

Minor

63ba824

Add namespace

d7eb2e0

Default batch size 2048 -> 2560

d02d394

memory utilization -> swap space

532365e

Fetch requests every step

d87b2b0

WoosukKwon merged commit cfae35b into main Mar 13, 2023

WoosukKwon deleted the minor branch March 13, 2023 20:48

TheBloke mentioned this pull request Jul 20, 2023

Can't launch OpenAI API server on newly installed vLLM in Docker - fastchat not found #537

Closed

v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023

Merge pull request vllm-project#8 from ri938/organise

2617c55

Organise

xiangyuT pushed a commit to xiangyuT/vllm that referenced this pull request Oct 24, 2023

Comments & minor changes (vllm-project#8)

a8561b8

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add miscellaneous updates (vllm-project#8)

cd9f1ac

sfc-gh-hazhang pushed a commit to sfc-gh-hazhang/vllm that referenced this pull request May 7, 2024

Merge pull request vllm-project#8 from Snowflake-Labs/remove-dummy

15de0c2

remove dummy path in arctic

yuhuixu1993 mentioned this pull request Jun 2, 2024

[Bug]: loading squeezellm model #5190

Open

ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024

Merge pull request vllm-project#8 from Starmys/dev/chengzhang/phi3moe…

dfaba7c

…128k Support Phi3SuScaledRotaryEmbedding for 128k model

This was referenced Jul 5, 2024

Support W4A8 quantization for vllm #5218

Merged

[Bug]: call for stack trace for "Watchdog caught collective operation timeout" #6042

Open

xinzaifeixiang1992 mentioned this pull request Jul 24, 2024

[Bug]: vllm-0.5.3.post1部署Qwen2-72b-instruct-awq模型，刚开始服务正常，但是并发高的时候就报错 #6734

Open

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

Minami-su mentioned this pull request Aug 11, 2024

[Bug]: vllm is crashed on v0.5.3.post1 #7161

Closed

zeroorhero pushed a commit to zeroorhero/vllm that referenced this pull request Sep 23, 2024

Merge pull request vllm-project#8 from KuntaiDu/jiayi-dev-v2

0dd3571

update overhead benchmark

liulisi16323 mentioned this pull request Sep 24, 2024

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

Open

1 task

SpaceHunterInf mentioned this pull request Sep 30, 2024

[Bug]: Bus error (core dumped) #8974

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add miscellaneous updates #8

Add miscellaneous updates #8

WoosukKwon commented Mar 13, 2023

Add miscellaneous updates #8

Add miscellaneous updates #8

Conversation

WoosukKwon commented Mar 13, 2023