Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add miscellaneous updates #8

Merged
merged 6 commits into from
Mar 13, 2023
Merged

Add miscellaneous updates #8

merged 6 commits into from
Mar 13, 2023

Conversation

WoosukKwon
Copy link
Collaborator

This PR contains several miscellaneous updates to the system, with two notable changes:

  1. The size of the CPU KV cache is now calculated based on the swap_space size provided by the user (defaulting to 20 GiB).
  2. The default value for max_num_batched_tokens has been increased from 2048 to 2560.

@WoosukKwon WoosukKwon merged commit cfae35b into main Mar 13, 2023
@WoosukKwon WoosukKwon deleted the minor branch March 13, 2023 20:48
v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023
xiangyuT pushed a commit to xiangyuT/vllm that referenced this pull request Oct 24, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
mzusman added a commit to mzusman/vllm that referenced this pull request Apr 16, 2024
* Return support for other models apart from jamba

* Support n>1

* A little cleanup

* Rename

* Apply whitespace suggestions from code review

* Add max batch size to the main func

* Fixed attention kv cache bug

* log where requests id are deleted from the dict to debug mode

* Fix typo

* Align with v0.3.3 vllm code

* Remove comments

* Take out model config from CUDAGraph object

* Fix

* Fix typo

* Make the kv cache selection cleaner

* Another typo

* Took the num layers calc outside

* Remove the -1

* Set as num layer / period

---------

Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
sfc-gh-hazhang pushed a commit to sfc-gh-hazhang/vllm that referenced this pull request May 7, 2024
ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024
…128k

Support Phi3SuScaledRotaryEmbedding for 128k model
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zeroorhero pushed a commit to zeroorhero/vllm that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant