Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Sagemaker Training, Deployment, and Inference API's #28

Closed
wants to merge 6 commits into from
Closed

Amazon Sagemaker Training, Deployment, and Inference API's #28

wants to merge 6 commits into from

Conversation

queueburt
Copy link

@queueburt queueburt commented Dec 6, 2019

Pull request for an initial implementation of streamlined API's for Amazon Sagemaker. Includes 3 functions: "fit", "deploy", and "predict". The primary algorithms this was built around were XGBoost and Linear Learner, but it should be compatible with any built in algorithm that accepts 'text/csv' as a content-type.

Two additional environment variables are required to run these flows.

  • METAFLOW_SAGEMAKER_REGION is the region of your Sagemaker run.
  • METAFLOW_SAGEMAKER_IAM_ROLE is a Sagemaker execution role with permissions to access all appropriate cloud resources, most notably S3.

Usage consists of from metaflow import Sagemaker.

Sagemaker.fit(data, image, hyperparameters, stopping_condition, resource_config)

Returns a string object with the S3 URI of the model artifact generated by the fit.

  • data is a dictionary with keys that reference Sagemaker "channel names" found here, and values that consist of CSV data with no headers or indexes.

  • image is a string consisting of a Sagemaker built in algorithm container registry path, also found here. Automatic mapping will be coming soon.

  • hyperparameters is a dictionary with hyperparameters for the specific algorithm referenced by image. An example for XGBoost can be found here.

  • stopping_condition and resource_config are optional dictionaries for overriding some defaults, specifically that of a single ml.m4.xlarge training instance with a 5 GB volume, and a 1 hour max runtime. Syntax for those overrides can be found here

Sagemaker.deploy(model_uri, image, instanceType, instanceCount, instanceWeight, variantName)

Returns a string object with the endpoint name generated by the model deployment.

  • model_uri is a string object with the S3 path for the model. This string is returned by Sagemaker.fit

  • image is a string object. It should be the same image used for training.

  • instanceType, instanceCount, instanceWeight, and variantName are all optional parameters for overriding the defaults of, respectively, "ml.m4.xlarge", 1, 1, and "AllTraffic".

Sagemaker.predict(data, endpoint_name)

Returns a list of predictions.

  • data is a CSV object with no headers or indexes representing the features for inference.

  • endpoint_name is a string object with the Sagemaker endpoint to be inferred against. This value is returned by Sagemaker.deploy.

A short example of the usage can be found here. This PR also brings in an 08-sagemaker tutorial for use with metaflow tutorials pull that demonstrates the above sample flow.

@queueburt queueburt changed the title Sagemaker Training, Deployment, and Inference API's Amazon Sagemaker Training, Deployment, and Inference API's Dec 6, 2019
@savingoyal savingoyal self-requested a review December 12, 2019 03:32
@savingoyal savingoyal added the enhancement New feature or request label Dec 12, 2019

## Sagemaker Image and Hyperparameters below are required fields.
## Common parameters and image info can be found at https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html
sagemaker_image = "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a legacy XGBoost image (0.72) which is not synced with upstream open-source XGboost. I suggest using
246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3. This image is based on open-source XGBoost and is pinned to v0.90 of XGBoost, with the -2 indicating a SageMaker version. For a given version, backwards-compatibility is guaranteed, so better to use that instead of :latest.

@tobias-gp
Copy link

@queueburt Thanks for this great proposal! Do you have any news on the topic?

@savingoyal savingoyal marked this pull request as draft January 7, 2022 19:20
@savingoyal
Copy link
Collaborator

Closing this PR in favor of native support for hosting models on Sagemaker with Metaflow

@savingoyal savingoyal closed this Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants