Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for VLLM server #495

Merged
merged 16 commits into from
Aug 3, 2023
Merged

Adding support for VLLM server #495

merged 16 commits into from
Aug 3, 2023

Conversation

aspctu
Copy link
Collaborator

@aspctu aspctu commented Aug 1, 2023

This PR adds support to deploy models with the vLLM server.

Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please post a video demo in slack. Maybe update branch so lint is resolved?

@@ -71,6 +71,41 @@ def create_tgi_build_dir(config: TrussConfig, build_dir: Path):
supervisord_filepath.write_text(supervisord_contents)


def create_vllm_build_dir(config: TrussConfig, build_dir: Path):
server_endpoint_config = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: constant should be all caps

)
nginx_template = read_template_from_fs(TEMPLATES_DIR, "vllm/proxy.conf.jinja")

dockerfile_content = dockerfile_template.render(hf_access_token=hf_access_token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should write a helpful function at this point. we've copied like code like 7 timesnow. I can do that in a followup.

@aspctu aspctu merged commit d495531 into main Aug 3, 2023
@aspctu aspctu deleted the abuqader/vllm-support branch August 3, 2023 20:14
aspctu added a commit that referenced this pull request Aug 3, 2023
* Cleanup old asyncio threadpool settings.

* Add `version.txt` into build to prevent cache migrations (#490)

* added hf token to dockerfile

* add prints

* move hf token to the part where its needed

* add version.txt

* added version.txt

* remove unnecessary files

* adding files back

* reverting

* Adding `allow_patterns` and `ignore_patterns` to the Hugging Face cache. (#480)

* remove assert

* both ignore and allow working

* fix for verbose

* remove prints

* Fix streaming issues.

* Added documentation for HF caching (#492)

* added hf_cache docs

* fix ticks

* Update configuration.md

* clean up Truss docs (#491)

* clean up Truss docs

* fix links and lints

* lint

* Comment updates.

* Bump version.

* Controlling supervisord retries (#496)

* update live reload docs (#494)

* update live reload docs

* remove leading space

* remove dead links from flan-t5 readme

* Update README.md

* Removing extraneous file (#498)

* Enable Hugging Face secrets during build from Truss  (#499)

* added hf token to dockerfile

* add prints

* move hf token to the part where its needed

* successfully mounting secrets

* update cache warmer to grab secret

* match data dir copy

* bump pyproject

* add os to cache_warmer

* bump pyproject

* add is_trusted

* revert version

* change names to be hf_access_token

* rename is_trusted and use Path

* Adding support for VLLM server (#495)

---------

Co-authored-by: Sidharth Shanker <sid.shanker@baseten.co>
Co-authored-by: Varun Shenoy <vnshenoy@stanford.edu>
Co-authored-by: Philip Kiely - Baseten <98474633+philipkiely-baseten@users.noreply.github.com>
Co-authored-by: joostinyi <63941848+joostinyi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants