-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundling Private Weights from GCP #552
Conversation
8d5deb5
to
b02228d
Compare
examples/vllm-gcs/config.yaml
Outdated
build: | ||
arguments: | ||
endpoint: Completions | ||
model: /app/hf_cache/llama-2-7b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just make the model gs://varuns-llama2-whatever
and then do the swap for the args under the hood so the user doesn't have to worry about it and to add the hf_cache
option below as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the model was a private HF model? What would the expected behavior be then?
@@ -91,7 +99,38 @@ def create_vllm_build_dir(config: TrussConfig, build_dir: Path): | |||
) | |||
nginx_template = read_template_from_fs(TEMPLATES_DIR, "vllm/proxy.conf.jinja") | |||
|
|||
dockerfile_content = dockerfile_template.render(hf_access_token=hf_access_token) | |||
(build_dir / "cache_requirements.txt").write_text(spec.requirements_txt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do this. I think we should make a cache_requirements.txt
in templates/ and just copy it directly. It should have the google client and huggingface hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add hf_transfer package to huggingface and set environment variable
|
||
{%- if hf_cache != None %} | ||
COPY ./cache_warmer.py /cache_warmer.py | ||
RUN chmod +x /cache_warmer.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is not necessary since you're doing python3 /cache_warmer.py
below. +x
is only needs if you wanna execute the file directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved for when you add the missing requirements file. Good work!
|
||
# Connect to GCS storage | ||
try: | ||
storage_client = storage.Client.from_service_account_json(key_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to optimize later: would be great if we only make this client once and re-use it for all the file downloads.
@@ -75,7 +82,14 @@ def create_tgi_build_dir(config: TrussConfig, build_dir: Path): | |||
supervisord_filepath.write_text(supervisord_contents) | |||
|
|||
|
|||
def create_vllm_build_dir(config: TrussConfig, build_dir: Path): | |||
def create_vllm_build_dir( | |||
config: TrussConfig, build_dir: Path, truss_dir: Path, spec: TrussSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spec is unnecesarry to add here.
filtered_repo_files = list( | ||
filter_repo_objects( | ||
items=list_files( | ||
repo_id, truss_dir / spec.config.data_dir, revision=revision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use config.data_dir
instead of spec
and drop the extra arg
Flow:
service_account.json
file in your data directory.gs://...
bucket forrepo_id
underhf_cache
.google-cloud-storage
under requirements.app/hf_cache/{{bucket_name}}
directory.Takes around ~180s to build the image for Llama 2 7B.
Next PR will be focused on documenting this feature. We should also rename
hf_cache
to something possibly more general. Maybebucket_cache
ormodel_cache
? I don't want people to confuse it withexternal_data
.