-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for VLLM server #495
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please post a video demo in slack. Maybe update branch so lint is resolved?
@@ -71,6 +71,41 @@ def create_tgi_build_dir(config: TrussConfig, build_dir: Path): | |||
supervisord_filepath.write_text(supervisord_contents) | |||
|
|||
|
|||
def create_vllm_build_dir(config: TrussConfig, build_dir: Path): | |||
server_endpoint_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constant should be all caps
) | ||
nginx_template = read_template_from_fs(TEMPLATES_DIR, "vllm/proxy.conf.jinja") | ||
|
||
dockerfile_content = dockerfile_template.render(hf_access_token=hf_access_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should write a helpful function at this point. we've copied like code like 7 timesnow. I can do that in a followup.
* Cleanup old asyncio threadpool settings. * Add `version.txt` into build to prevent cache migrations (#490) * added hf token to dockerfile * add prints * move hf token to the part where its needed * add version.txt * added version.txt * remove unnecessary files * adding files back * reverting * Adding `allow_patterns` and `ignore_patterns` to the Hugging Face cache. (#480) * remove assert * both ignore and allow working * fix for verbose * remove prints * Fix streaming issues. * Added documentation for HF caching (#492) * added hf_cache docs * fix ticks * Update configuration.md * clean up Truss docs (#491) * clean up Truss docs * fix links and lints * lint * Comment updates. * Bump version. * Controlling supervisord retries (#496) * update live reload docs (#494) * update live reload docs * remove leading space * remove dead links from flan-t5 readme * Update README.md * Removing extraneous file (#498) * Enable Hugging Face secrets during build from Truss (#499) * added hf token to dockerfile * add prints * move hf token to the part where its needed * successfully mounting secrets * update cache warmer to grab secret * match data dir copy * bump pyproject * add os to cache_warmer * bump pyproject * add is_trusted * revert version * change names to be hf_access_token * rename is_trusted and use Path * Adding support for VLLM server (#495) --------- Co-authored-by: Sidharth Shanker <sid.shanker@baseten.co> Co-authored-by: Varun Shenoy <vnshenoy@stanford.edu> Co-authored-by: Philip Kiely - Baseten <98474633+philipkiely-baseten@users.noreply.github.com> Co-authored-by: joostinyi <63941848+joostinyi@users.noreply.github.com>
This PR adds support to deploy models with the vLLM server.