Amazon Sagemaker Training, Deployment, and Inference API's #28

queueburt · 2019-12-06T09:12:29Z

Pull request for an initial implementation of streamlined API's for Amazon Sagemaker. Includes 3 functions: "fit", "deploy", and "predict". The primary algorithms this was built around were XGBoost and Linear Learner, but it should be compatible with any built in algorithm that accepts 'text/csv' as a content-type.

Two additional environment variables are required to run these flows.

METAFLOW_SAGEMAKER_REGION is the region of your Sagemaker run.
METAFLOW_SAGEMAKER_IAM_ROLE is a Sagemaker execution role with permissions to access all appropriate cloud resources, most notably S3.

Usage consists of from metaflow import Sagemaker.

Sagemaker.fit(data, image, hyperparameters, stopping_condition, resource_config)

Returns a string object with the S3 URI of the model artifact generated by the fit.

data is a dictionary with keys that reference Sagemaker "channel names" found here, and values that consist of CSV data with no headers or indexes.
image is a string consisting of a Sagemaker built in algorithm container registry path, also found here. Automatic mapping will be coming soon.
hyperparameters is a dictionary with hyperparameters for the specific algorithm referenced by image. An example for XGBoost can be found here.
stopping_condition and resource_config are optional dictionaries for overriding some defaults, specifically that of a single ml.m4.xlarge training instance with a 5 GB volume, and a 1 hour max runtime. Syntax for those overrides can be found here

Sagemaker.deploy(model_uri, image, instanceType, instanceCount, instanceWeight, variantName)

Returns a string object with the endpoint name generated by the model deployment.

model_uri is a string object with the S3 path for the model. This string is returned by Sagemaker.fit
image is a string object. It should be the same image used for training.
instanceType, instanceCount, instanceWeight, and variantName are all optional parameters for overriding the defaults of, respectively, "ml.m4.xlarge", 1, 1, and "AllTraffic".

Sagemaker.predict(data, endpoint_name)

Returns a list of predictions.

data is a CSV object with no headers or indexes representing the features for inference.
endpoint_name is a string object with the Sagemaker endpoint to be inferred against. This value is returned by Sagemaker.deploy.

A short example of the usage can be found here. This PR also brings in an 08-sagemaker tutorial for use with metaflow tutorials pull that demonstrates the above sample flow.

iyerr3 · 2020-03-19T18:39:20Z

metaflow/tutorials/08-sagemaker/sagemaker.py

+
+        ## Sagemaker Image and Hyperparameters below are required fields.
+        ## Common parameters and image info can be found at https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html
+        sagemaker_image = "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest"


This is a legacy XGBoost image (0.72) which is not synced with upstream open-source XGboost. I suggest using
246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3. This image is based on open-source XGBoost and is pinned to v0.90 of XGBoost, with the -2 indicating a SageMaker version. For a given version, backwards-compatibility is guaranteed, so better to use that instead of :latest.

tobias-gp · 2021-01-19T12:44:25Z

@queueburt Thanks for this great proposal! Do you have any news on the topic?

savingoyal · 2023-03-29T18:38:02Z

Closing this PR in favor of native support for hosting models on Sagemaker with Metaflow

Rob added 2 commits December 5, 2019 16:01

Initial commit for Sagemaker integration

afbafa9

Removing initial tutorial to rewrite with Conda

a13986a

queueburt changed the title ~~Sagemaker Training, Deployment, and Inference API's~~ Amazon Sagemaker Training, Deployment, and Inference API's Dec 6, 2019

Rob added 4 commits December 8, 2019 22:27

New tutorial and adjustments to match OSS version of Metaflow

1e9f1b1

Fixing type-o

de466b0

README aesthetics

c679e06

Moving environment logic to sagemaker_params

3941820

savingoyal self-requested a review December 12, 2019 03:32

savingoyal added the enhancement New feature or request label Dec 12, 2019

Ishaan28malik approved these changes Dec 21, 2019

View reviewed changes

iyerr3 reviewed Mar 25, 2020

View reviewed changes

savingoyal mentioned this pull request May 19, 2020

Support for hosting artifacts as microservices #3

Open

savingoyal marked this pull request as draft January 7, 2022 19:20

savingoyal closed this Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon Sagemaker Training, Deployment, and Inference API's #28

Amazon Sagemaker Training, Deployment, and Inference API's #28

queueburt commented Dec 6, 2019 •

edited

Loading

iyerr3 Mar 19, 2020

tobias-gp commented Jan 19, 2021

savingoyal commented Mar 29, 2023

Amazon Sagemaker Training, Deployment, and Inference API's #28

Amazon Sagemaker Training, Deployment, and Inference API's #28

Conversation

queueburt commented Dec 6, 2019 • edited Loading

iyerr3 Mar 19, 2020

Choose a reason for hiding this comment

tobias-gp commented Jan 19, 2021

savingoyal commented Mar 29, 2023

queueburt commented Dec 6, 2019 •

edited

Loading