Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCP support to Cortex #1655

Merged
merged 117 commits into from
Dec 8, 2020
Merged

Add GCP support to Cortex #1655

merged 117 commits into from
Dec 8, 2020

Conversation

RobertLucian
Copy link
Member

@RobertLucian RobertLucian commented Dec 4, 2020

Closes #114, closes #1600, closes #1602, closes #1616, closes #1624, closes #1530, closes #1633, closes #1646


checklist:

  • run make test and make lint
  • test manually (i.e. build/push all images, restart operator, and re-deploy APIs)

Copy link
Member Author

@RobertLucian RobertLucian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some comments, but overall, LGTM! Can't approve the PR because I'm the author of this.

MetadataRoot string `json:"metadata_root"`
ProjectID string `json:"project_id"`
ProjectKey string `json:"project_key"`
LocalModelCaches []*LocalModelCache `json:"local_model_cache"` // local only
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be able to get rid of LocalModelCaches too from the API struct. This is only required internally when deploying an API and not on the serving container. Just like it is the case with CuratedModelResoures. Not exactly in this PR.

Copy link
Member Author

@RobertLucian RobertLucian Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deliahu at a second look, it seems like LocalModelCaches is still necessary due to one reason: When re-deploying an API of whose model's contents have changed, without the LocalModelCaches present in the API spec, the API wouldn't get updated. The function responsible for this is areAPIsEqual in the local/api.go file.

Assuming the above situation doesn't exist, one small setback which I think falls outside the scope of this PR is that we'd need to start using the model IDs from the running containers to determine if a model can be removed from the Cortex cache or not - right now, we are relying on the API spec to tell us that. I think keeping this in the API spec is better because a container is ephemeral, whereas the API spec will be there after a reboot as well - just as the model will still be there.

Because of the first reason listed in this comment, I think the presence of LocalModelCaches in the API spec is actually justified.

S3Path bool `json:"s3_path"`
Versions []int64 `json:"versions"`
S3Path bool `json:"s3_path"`
GCSPath bool `json:"gcs_path"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GCS notation is potentially confusing. In some places, we go with gs and in others with gcs. What is the difference?

A: gs is the name of the bucket type on GCP (i.e. gs://models/onnx/) and GCS is the name of the bucket storage system on GCP. We should probably name these as gs to keep it consistent with the S3 alternative and to eliminate the potential confusion.

Not a showstopper for this PR.

@@ -1,5 +1,3 @@
#!/bin/bash

# Copyright 2020 Cortex Labs, Inc.
#
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for addition: when importing the cortex package outside the container (in a dev setup), this __init__.py seems to be required. The rename that git is reporting is inaccurate.

@@ -788,7 +676,7 @@ def find_ondisk_model_info(lock_dir: str, model_name: str) -> Tuple[List[str], L

class TFSModelLoader(mp.Process):
"""
Monitors the S3 path(s)/dir and continuously updates the models on TFS.
Monitors the cloud path(s)/dir (S3 or GS) and continuously updates the models on TFS.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same confusion about GS vs GCS. Can be fixed later on.

@RobertLucian RobertLucian marked this pull request as ready for review December 8, 2020 00:26
@deliahu deliahu merged commit f1768e0 into master Dec 8, 2020
@deliahu deliahu deleted the feature/gcp branch December 8, 2020 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants