Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Sagemaker : Use json.dumps() to better organize the input and remove data_locations #3518

Merged
merged 4 commits into from
Apr 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions components/aws/sagemaker/Changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Change log for AWS SageMaker Components

The version of the AWS SageMaker Components is determined by the docker image tag used in YAML spec
Repository: https://hub.docker.com/repository/docker/amazon/aws-sagemaker-kfp-components

---------------------------------------------

**Change log for version 0.3.0**
- Remove data_location parameters from all components
(Use "channes" parameter instead)

> Pull requests : [#3518](https://github.com/kubeflow/pipelines/pull/3518)


**Change log for version 2.0 (Apr 14, 2020)**
akartsky marked this conversation as resolved.
Show resolved Hide resolved
- Fix bug in Ground Truth component
- Add user agent header to boto3 client

> Pull requests: [#3474](https://github.com/kubeflow/pipelines/pull/3474), [#3487](https://github.com/kubeflow/pipelines/pull/3487)


---------------------------------------------

## Old

These are the old images which were in https://hub.docker.com/r/redbackthomson/aws-kubeflow-sagemaker/tags

**Change log 20200402**
- Fix for vpc issue
- Add license files
- Use AmazonLinux instead of Ubuntu
- Pin the pip packages


> Pull requests: [#3374](https://github.com/kubeflow/pipelines/pull/3374), [#3397](https://github.com/kubeflow/pipelines/pull/3397)

No change log available for older images
Please check git log


2 changes: 1 addition & 1 deletion components/aws/sagemaker/THIRD-PARTY-LICENSES.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
** Amazon SageMaker Components for Kubeflow Pipelines; version 0.2.0 --
** Amazon SageMaker Components for Kubeflow Pipelines; version 0.3.0 --
https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker
Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
** boto3; version 1.12.33 -- https://github.com/boto/boto3/
Expand Down
2 changes: 1 addition & 1 deletion components/aws/sagemaker/batch_transform/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ outputs:
- {name: output_location, description: 'S3 URI of the transform job results.'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
akartsky marked this conversation as resolved.
Show resolved Hide resolved
command: ['python']
args: [
batch_transform.py,
Expand Down
10 changes: 0 additions & 10 deletions components/aws/sagemaker/common/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,6 @@ def create_training_job_request(args):
### Update input channels, must have at least one specified
if len(args['channels']) > 0:
request['InputDataConfig'] = args['channels']
# Max number of input channels/data locations is 20, but currently only 8 data location parameters are exposed separately.
# Source: Input data configuration description in the SageMaker create training job form
for i in range(1, len(args['channels']) + 1):
if args['data_location_' + str(i)]:
request['InputDataConfig'][i-1]['DataSource']['S3DataSource']['S3Uri'] = args['data_location_' + str(i)]
else:
logging.error("Must specify at least one input channel.")
raise Exception('Could not create job request')
Expand Down Expand Up @@ -517,11 +512,6 @@ def create_hyperparameter_tuning_job_request(args):
### Update input channels, must have at least one specified
if len(args['channels']) > 0:
request['TrainingJobDefinition']['InputDataConfig'] = args['channels']
# Max number of input channels/data locations is 20, but currently only 8 data location parameters are exposed separately.
# Source: Input data configuration description in the SageMaker create hyperparameter tuning job form
for i in range(1, len(args['channels']) + 1):
if args['data_location_' + str(i)]:
request['TrainingJobDefinition']['InputDataConfig'][i-1]['DataSource']['S3DataSource']['S3Uri'] = args['data_location_' + str(i)]
else:
logging.error("Must specify at least one input channel.")
raise Exception('Could not make job request')
Expand Down
2 changes: 1 addition & 1 deletion components/aws/sagemaker/deploy/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ outputs:
- {name: endpoint_name, description: 'Endpoint name'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
deploy.py,
Expand Down
2 changes: 1 addition & 1 deletion components/aws/sagemaker/ground_truth/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ outputs:
- {name: active_learning_model_arn, description: 'The ARN for the most recent Amazon SageMaker model trained as part of automated data labeling.'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
ground_truth.py,
Expand Down
2 changes: 0 additions & 2 deletions components/aws/sagemaker/hyperparameter_tuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ integer_parameters | The array of IntegerParameterRange objects that specify ran
continuous_parameters | The array of ContinuousParameterRange objects that specify ranges of continuous hyperparameters that you want to search | Yes | Yes | List of Dicts | | [] |
categorical_parameters | The array of CategoricalParameterRange objects that specify ranges of categorical hyperparameters that you want to search | Yes | Yes | List of Dicts | | [] |
channels | A list of dicts specifying the input channels (at least one); refer to [documentation](https://github.com/awsdocs/amazon-sagemaker-developer-guide/blob/master/doc_source/API_Channel.md) for parameters | No | No | List of Dicts | | |
data_location_[1, 8] | The S3 URI of the input data source for channel [1, 8] | Yes | Yes | | |
output_location | The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job | No | No | String | | |
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | Yes | String | | |
instance_type | The ML compute instance type | Yes | No | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge | ml.m4.xlarge |
Expand All @@ -52,7 +51,6 @@ Notes:
* Specify training image OR algorithm name. Use the image parameter for Bring Your Own Container (BYOC) algorithms, and algorithm name for Amazon built-in algorithms, custom algorithm resources in SageMaker, and algorithms subscribed to from the AWS Marketplace.
* Specify VPC security group IDs AND VPC subnets to specify the VPC that you want the training jobs to connect to.
* Specify warm start type AND 1 to 5 parent HPO jobs to launch the hyperparameter tuning job with previous jobs as a starting point.
* The parameters, data_location_1 through 8, is intended to be used for inputting the S3 URI outputs from previous steps in the pipeline, for example, from a Ground Truth labeling job. Otherwise, the S3 data location can be specified directly in the channels parameter.

## Outputs
Name | Description
Expand Down
34 changes: 1 addition & 33 deletions components/aws/sagemaker/hyperparameter_tuning/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,30 +45,6 @@ inputs:
default: '[]'
- name: channels
description: 'A list of dicts specifying the input channels. Must have at least one.'
- name: data_location_1
description: 'The S3 URI of the input data source for channel 1.'
default: ''
- name: data_location_2
description: 'The S3 URI of the input data source for channel 2.'
default: ''
- name: data_location_3
description: 'The S3 URI of the input data source for channel 3.'
default: ''
- name: data_location_4
description: 'The S3 URI of the input data source for channel 4.'
default: ''
- name: data_location_5
description: 'The S3 URI of the input data source for channel 5.'
default: ''
- name: data_location_6
description: 'The S3 URI of the input data source for channel 6.'
default: ''
- name: data_location_7
description: 'The S3 URI of the input data source for channel 7.'
default: ''
- name: data_location_8
description: 'The S3 URI of the input data source for channel 8.'
default: ''
- name: output_location
description: 'The Amazon S3 path where you want Amazon SageMaker to store the model artifacts is from the best training job.'
- name: output_encryption_key
Expand Down Expand Up @@ -139,7 +115,7 @@ outputs:
description: 'The registry path of the Docker image that contains the training algorithm'
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
hyperparameter_tuning.py,
Expand All @@ -160,14 +136,6 @@ implementation:
--continuous_parameters, {inputValue: continuous_parameters},
--categorical_parameters, {inputValue: categorical_parameters},
--channels, {inputValue: channels},
--data_location_1, {inputValue: data_location_1},
--data_location_2, {inputValue: data_location_2},
--data_location_3, {inputValue: data_location_3},
--data_location_4, {inputValue: data_location_4},
--data_location_5, {inputValue: data_location_5},
--data_location_6, {inputValue: data_location_6},
--data_location_7, {inputValue: data_location_7},
--data_location_8, {inputValue: data_location_8},
--output_location, {inputValue: output_location},
--output_encryption_key, {inputValue: output_encryption_key},
--instance_type, {inputValue: instance_type},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,6 @@ def create_parser():
parser.add_argument('--continuous_parameters', type=_utils.str_to_json_list, required=False, help='The array of ContinuousParameterRange objects that specify ranges of continuous hyperparameters that you want to search.', default='[]')
parser.add_argument('--categorical_parameters', type=_utils.str_to_json_list, required=False, help='The array of CategoricalParameterRange objects that specify ranges of categorical hyperparameters that you want to search.', default='[]')
parser.add_argument('--channels', type=_utils.str_to_json_list, required=True, help='A list of dicts specifying the input channels. Must have at least one.')
parser.add_argument('--data_location_1', type=str.strip, required=False, help='The S3 URI of the input data source for channel 1.', default='')
parser.add_argument('--data_location_2', type=str.strip, required=False, help='The S3 URI of the input data source for channel 2.', default='')
parser.add_argument('--data_location_3', type=str.strip, required=False, help='The S3 URI of the input data source for channel 3.', default='')
parser.add_argument('--data_location_4', type=str.strip, required=False, help='The S3 URI of the input data source for channel 4.', default='')
parser.add_argument('--data_location_5', type=str.strip, required=False, help='The S3 URI of the input data source for channel 5.', default='')
parser.add_argument('--data_location_6', type=str.strip, required=False, help='The S3 URI of the input data source for channel 6.', default='')
parser.add_argument('--data_location_7', type=str.strip, required=False, help='The S3 URI of the input data source for channel 7.', default='')
parser.add_argument('--data_location_8', type=str.strip, required=False, help='The S3 URI of the input data source for channel 8.', default='')
parser.add_argument('--output_location', type=str.strip, required=True, help='The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job.')
parser.add_argument('--output_encryption_key', type=str.strip, required=False, help='The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts.', default='')
parser.add_argument('--instance_type', choices=['ml.m4.xlarge', 'ml.m4.2xlarge', 'ml.m4.4xlarge', 'ml.m4.10xlarge', 'ml.m4.16xlarge', 'ml.m5.large', 'ml.m5.xlarge', 'ml.m5.2xlarge', 'ml.m5.4xlarge',
Expand Down
2 changes: 1 addition & 1 deletion components/aws/sagemaker/model/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ outputs:
- {name: model_name, description: 'The model name Sagemaker created'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
create_model.py,
Expand Down
3 changes: 0 additions & 3 deletions components/aws/sagemaker/train/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@ max_wait_time | The maximum time in seconds you are willing to wait for a manage
checkpoint_config | Dictionary of information about the output location for managed spot training checkpoint data | Yes | Dict | | {} |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |

Notes :
* The parameters, data_location_1 through 8, is intended to be used for inputting the S3 URI outputs from previous steps in the pipeline, for example, from a Ground Truth labeling job. Otherwise, the S3 data location can be specified directly in the channels parameter.


## Output
Stores the Model in the s3 bucket you specified
Expand Down
34 changes: 1 addition & 33 deletions components/aws/sagemaker/train/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,30 +26,6 @@ inputs:
default: '{}'
- name: channels
description: 'A list of dicts specifying the input channels. Must have at least one.'
- name: data_location_1
description: 'The S3 URI of the input data source for channel 1.'
default: ''
- name: data_location_2
description: 'The S3 URI of the input data source for channel 2.'
default: ''
- name: data_location_3
description: 'The S3 URI of the input data source for channel 3.'
default: ''
- name: data_location_4
description: 'The S3 URI of the input data source for channel 4.'
default: ''
- name: data_location_5
description: 'The S3 URI of the input data source for channel 5.'
default: ''
- name: data_location_6
description: 'The S3 URI of the input data source for channel 6.'
default: ''
- name: data_location_7
description: 'The S3 URI of the input data source for channel 7.'
default: ''
- name: data_location_8
description: 'The S3 URI of the input data source for channel 8.'
default: ''
- name: instance_type
description: 'The ML compute instance type.'
default: 'ml.m4.xlarge'
Expand Down Expand Up @@ -103,7 +79,7 @@ outputs:
- {name: training_image, description: 'The registry path of the Docker image that contains the training algorithm'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
train.py,
Expand All @@ -117,14 +93,6 @@ implementation:
--training_input_mode, {inputValue: training_input_mode},
--hyperparameters, {inputValue: hyperparameters},
--channels, {inputValue: channels},
--data_location_1, {inputValue: data_location_1},
--data_location_2, {inputValue: data_location_2},
--data_location_3, {inputValue: data_location_3},
--data_location_4, {inputValue: data_location_4},
--data_location_5, {inputValue: data_location_5},
--data_location_6, {inputValue: data_location_6},
--data_location_7, {inputValue: data_location_7},
--data_location_8, {inputValue: data_location_8},
--instance_type, {inputValue: instance_type},
--instance_count, {inputValue: instance_count},
--volume_size, {inputValue: volume_size},
Expand Down
8 changes: 0 additions & 8 deletions components/aws/sagemaker/train/src/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,6 @@ def create_parser():
parser.add_argument('--training_input_mode', choices=['File', 'Pipe'], type=str.strip, help='The input mode that the algorithm supports. File or Pipe.', default='File')
parser.add_argument('--hyperparameters', type=_utils.str_to_json_dict, help='Dictionary of hyperparameters for the the algorithm.', default='{}')
parser.add_argument('--channels', type=_utils.str_to_json_list, required=True, help='A list of dicts specifying the input channels. Must have at least one.')
parser.add_argument('--data_location_1', type=str.strip, required=False, help='The S3 URI of the input data source for channel 1.', default='')
parser.add_argument('--data_location_2', type=str.strip, required=False, help='The S3 URI of the input data source for channel 2.', default='')
parser.add_argument('--data_location_3', type=str.strip, required=False, help='The S3 URI of the input data source for channel 3.', default='')
parser.add_argument('--data_location_4', type=str.strip, required=False, help='The S3 URI of the input data source for channel 4.', default='')
parser.add_argument('--data_location_5', type=str.strip, required=False, help='The S3 URI of the input data source for channel 5.', default='')
parser.add_argument('--data_location_6', type=str.strip, required=False, help='The S3 URI of the input data source for channel 6.', default='')
parser.add_argument('--data_location_7', type=str.strip, required=False, help='The S3 URI of the input data source for channel 7.', default='')
parser.add_argument('--data_location_8', type=str.strip, required=False, help='The S3 URI of the input data source for channel 8.', default='')
parser.add_argument('--instance_type', required=True, choices=['ml.m4.xlarge', 'ml.m4.2xlarge', 'ml.m4.4xlarge', 'ml.m4.10xlarge', 'ml.m4.16xlarge', 'ml.m5.large', 'ml.m5.xlarge', 'ml.m5.2xlarge', 'ml.m5.4xlarge',
'ml.m5.12xlarge', 'ml.m5.24xlarge', 'ml.c4.xlarge', 'ml.c4.2xlarge', 'ml.c4.4xlarge', 'ml.c4.8xlarge', 'ml.p2.xlarge', 'ml.p2.8xlarge', 'ml.p2.16xlarge', 'ml.p3.2xlarge', 'ml.p3.8xlarge', 'ml.p3.16xlarge',
'ml.c5.xlarge', 'ml.c5.2xlarge', 'ml.c5.4xlarge', 'ml.c5.9xlarge', 'ml.c5.18xlarge'], type=str.strip, help='The ML compute instance type.', default='ml.m4.xlarge')
Expand Down
2 changes: 1 addition & 1 deletion components/aws/sagemaker/workteam/component.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ outputs:
- {name: workteam_arn, description: 'The ARN of the workteam.'}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:0.2
image: amazon/aws-sagemaker-kfp-components:0.3.0
command: ['python']
args: [
workteam.py,
Expand Down
Loading