Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling Private Weights from GCP #552

Merged
merged 30 commits into from
Aug 18, 2023
Merged

Bundling Private Weights from GCP #552

merged 30 commits into from
Aug 18, 2023

Conversation

varunshenoy
Copy link
Contributor

@varunshenoy varunshenoy commented Aug 14, 2023

Flow:

  • Place service_account.json file in your data directory.
  • Pass gs://... bucket for repo_id under hf_cache.
  • Add google-cloud-storage under requirements.
  • It will download the weights into the app/hf_cache/{{bucket_name}} directory.

Takes around ~180s to build the image for Llama 2 7B.

Next PR will be focused on documenting this feature. We should also rename hf_cache to something possibly more general. Maybe bucket_cache or model_cache? I don't want people to confuse it with external_data.

@varunshenoy varunshenoy requested a review from bolasim August 14, 2023 19:29
@varunshenoy varunshenoy changed the title WIP: Bundling Private Weights from GCP Bundling Private Weights from GCP Aug 16, 2023
@varunshenoy varunshenoy marked this pull request as ready for review August 16, 2023 23:46
@varunshenoy varunshenoy requested a review from aspctu August 16, 2023 23:46
build:
arguments:
endpoint: Completions
model: /app/hf_cache/llama-2-7b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make the model gs://varuns-llama2-whatever and then do the swap for the args under the hood so the user doesn't have to worry about it and to add the hf_cache option below as well?

Copy link
Contributor Author

@varunshenoy varunshenoy Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the model was a private HF model? What would the expected behavior be then?

@@ -91,7 +99,38 @@ def create_vllm_build_dir(config: TrussConfig, build_dir: Path):
)
nginx_template = read_template_from_fs(TEMPLATES_DIR, "vllm/proxy.conf.jinja")

dockerfile_content = dockerfile_template.render(hf_access_token=hf_access_token)
(build_dir / "cache_requirements.txt").write_text(spec.requirements_txt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do this. I think we should make a cache_requirements.txt in templates/ and just copy it directly. It should have the google client and huggingface hub

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add hf_transfer package to huggingface and set environment variable


{%- if hf_cache != None %}
COPY ./cache_warmer.py /cache_warmer.py
RUN chmod +x /cache_warmer.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is not necessary since you're doing python3 /cache_warmer.py below. +x is only needs if you wanna execute the file directly

truss/templates/vllm/vllm.Dockerfile.jinja Show resolved Hide resolved
Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved for when you add the missing requirements file. Good work!


# Connect to GCS storage
try:
storage_client = storage.Client.from_service_account_json(key_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to optimize later: would be great if we only make this client once and re-use it for all the file downloads.

@@ -75,7 +82,14 @@ def create_tgi_build_dir(config: TrussConfig, build_dir: Path):
supervisord_filepath.write_text(supervisord_contents)


def create_vllm_build_dir(config: TrussConfig, build_dir: Path):
def create_vllm_build_dir(
config: TrussConfig, build_dir: Path, truss_dir: Path, spec: TrussSpec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spec is unnecesarry to add here.

filtered_repo_files = list(
filter_repo_objects(
items=list_files(
repo_id, truss_dir / spec.config.data_dir, revision=revision
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use config.data_dir instead of spec and drop the extra arg

@varunshenoy varunshenoy merged commit 0a52e52 into main Aug 18, 2023
@varunshenoy varunshenoy deleted the varun/gcs-cache branch August 18, 2023 05:07
@varunshenoy varunshenoy restored the varun/gcs-cache branch August 18, 2023 21:57
bolasim added a commit that referenced this pull request Aug 18, 2023
varunshenoy pushed a commit that referenced this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants