Add GCP support to Cortex #1655

RobertLucian · 2020-12-04T23:30:08Z

Closes #114, closes #1600, closes #1602, closes #1616, closes #1624, closes #1530, closes #1633, closes #1646

checklist:

run make test and make lint
test manually (i.e. build/push all images, restart operator, and re-deploy APIs)

# Conflicts: # cli/cmd/gcp.go # go.mod # pkg/operator/config/config.go

This reverts commit 31c07ba.

RobertLucian

Had some comments, but overall, LGTM! Can't approve the PR because I'm the author of this.

RobertLucian · 2020-12-07T22:38:45Z

pkg/types/spec/api.go

+	MetadataRoot     string             `json:"metadata_root"`
+	ProjectID        string             `json:"project_id"`
+	ProjectKey       string             `json:"project_key"`
+	LocalModelCaches []*LocalModelCache `json:"local_model_cache"` // local only


I think we may be able to get rid of LocalModelCaches too from the API struct. This is only required internally when deploying an API and not on the serving container. Just like it is the case with CuratedModelResoures. Not exactly in this PR.

@deliahu at a second look, it seems like LocalModelCaches is still necessary due to one reason: When re-deploying an API of whose model's contents have changed, without the LocalModelCaches present in the API spec, the API wouldn't get updated. The function responsible for this is areAPIsEqual in the local/api.go file.

Assuming the above situation doesn't exist, one small setback which I think falls outside the scope of this PR is that we'd need to start using the model IDs from the running containers to determine if a model can be removed from the Cortex cache or not - right now, we are relying on the API spec to tell us that. I think keeping this in the API spec is better because a container is ephemeral, whereas the API spec will be there after a reboot as well - just as the model will still be there.

Because of the first reason listed in this comment, I think the presence of LocalModelCaches in the API spec is actually justified.

RobertLucian · 2020-12-07T22:45:33Z

pkg/types/spec/api.go

-	S3Path   bool    `json:"s3_path"`
-	Versions []int64 `json:"versions"`
+	S3Path    bool    `json:"s3_path"`
+	GCSPath   bool    `json:"gcs_path"`


The GCS notation is potentially confusing. In some places, we go with gs and in others with gcs. What is the difference?

A: gs is the name of the bucket type on GCP (i.e. gs://models/onnx/) and GCS is the name of the bucket storage system on GCP. We should probably name these as gs to keep it consistent with the S3 alternative and to eliminate the potential confusion.

Not a showstopper for this PR.

pkg/types/spec/utils.go

RobertLucian · 2020-12-07T23:33:19Z

pkg/workloads/cortex/__init__.py

@@ -1,5 +1,3 @@
-#!/bin/bash
-
 # Copyright 2020 Cortex Labs, Inc.
 #


Reason for addition: when importing the cortex package outside the container (in a dev setup), this __init__.py seems to be required. The rename that git is reporting is inaccurate.

pkg/workloads/cortex/lib/api/predictor.py

pkg/workloads/cortex/lib/model/cron.py

RobertLucian · 2020-12-07T23:41:26Z

pkg/workloads/cortex/lib/model/cron.py

@@ -788,7 +676,7 @@ def find_ondisk_model_info(lock_dir: str, model_name: str) -> Tuple[List[str], L

 class TFSModelLoader(mp.Process):
    """
-    Monitors the S3 path(s)/dir and continuously updates the models on TFS.
+    Monitors the cloud path(s)/dir (S3 or GS) and continuously updates the models on TFS.


Same confusion about GS vs GCS. Can be fixed later on.

pkg/workloads/cortex/serve/init/bootloader.sh

pkg/operator/resources/validations.go

pkg/types/spec/validations.go

…re/gcp

vishalbollu and others added 30 commits November 12, 2020 11:23

Spin up a GCP cluster with help from AWS

4ec14d2

Merge branch 'gcp' into feature/gcp

accd820

Download API spec from GCS (hardcoded)

3f371f3

Add search method for GCS class

e7abf60

Add GCS package in requirements.txt file & re-position func

0242bb2

Add GCP stuff

d4c19a9

Add GCP library

ddd9a45

Use GCP config for creating a GCP cortex cluster

8c1037e

Add few more things for GCS library

8208eaf

Fix nits

cd44628

Merge branch 'gcp' into feature/gcp

72bac9a

# Conflicts: # cli/cmd/gcp.go # go.mod # pkg/operator/config/config.go

Fix merge issues

b4e61a9

GCP fixes

56b73f8

Fixes

f7b5b08

Get the right API endpoint

94b0ce2

GCP WIP

a1bfec1

Shorten query for logs for GCP

31c07ba

Revert "Shorten query for logs for GCP"

e665eb6

This reverts commit 31c07ba.

GCP WIP

9ddaa7c

Don't show the dashboard URL if it's empty

7c628a1

GCP WIP

cd1d223

GCP fix for cluster up

6def5d5

Add API spec validation

abd161a

Wait for load-balancers to come online for GCP

562060c

Allow UpdateStrategy for GCP provider in API spec

eee3fb5

Uncomment cluster up lines and add helper lines

b694fa9

Fix local deployments for ONNX/Tensorflow

9c2b17d

Provider validation (not good)

1c41389

Fix AWS creds + some hardcodes

5ca20ce

Remove CuratedModelResources dependency from the serving container

b4d698a

deliahu and others added 19 commits December 6, 2020 19:35

Update GCP client

1ef7d80

Specify bucket location

5b468b6

Remove environment on cluster down

c88169e

Update in-memory cluster auth context

44fdd92

Fix cron

2fd71f8

Lint

4c0a190

Update go packages

7b60078

Hash account ID

6e0310b

Use capacity in cron

b7187e9

Update GCP logs response

8bfb049

Misc

5dad41e

Add cloud wrappers

fd37c5c

Use more wrappers

3adebc3

More wrappers

ef7b39e

Merge branch 'master' into feature/gcp

044642a

Fix cluster-gcp down prompt cmd (leading to panic)

575f5c2

Configure GCP env with cluster_client cmd

d412ab9

Fix typo

76dbc3b

Misc changes

569b014

RobertLucian commented Dec 8, 2020

View reviewed changes

Address review requests

3639f26

RobertLucian marked this pull request as ready for review December 8, 2020 00:26

deliahu reviewed Dec 8, 2020

View reviewed changes

pkg/operator/resources/validations.go Show resolved Hide resolved

deliahu reviewed Dec 8, 2020

View reviewed changes

pkg/types/spec/validations.go Outdated Show resolved Hide resolved

deliahu added 3 commits December 7, 2020 22:15

Misc

6808dc1

Merge branch 'feature/gcp' of github.com:cortexlabs/cortex into featu…

0a98b16

…re/gcp

Rename provider

cd544d8

deliahu approved these changes Dec 8, 2020

View reviewed changes

deliahu merged commit f1768e0 into master Dec 8, 2020

deliahu deleted the feature/gcp branch December 8, 2020 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GCP support to Cortex #1655

Add GCP support to Cortex #1655

RobertLucian commented Dec 4, 2020 •

edited by deliahu

Loading

RobertLucian left a comment

RobertLucian Dec 7, 2020

RobertLucian Dec 8, 2020 •

edited

Loading

RobertLucian Dec 7, 2020

RobertLucian Dec 7, 2020

RobertLucian Dec 7, 2020

Add GCP support to Cortex #1655

Add GCP support to Cortex #1655

Conversation

RobertLucian commented Dec 4, 2020 • edited by deliahu Loading

RobertLucian left a comment

Choose a reason for hiding this comment

RobertLucian Dec 7, 2020

Choose a reason for hiding this comment

RobertLucian Dec 8, 2020 • edited Loading

Choose a reason for hiding this comment

RobertLucian Dec 7, 2020

Choose a reason for hiding this comment

RobertLucian Dec 7, 2020

Choose a reason for hiding this comment

RobertLucian Dec 7, 2020

Choose a reason for hiding this comment

RobertLucian commented Dec 4, 2020 •

edited by deliahu

Loading

RobertLucian Dec 8, 2020 •

edited

Loading