Skip to content

Commit

Permalink
feat: Support placeholders for TuningStep parameters
Browse files Browse the repository at this point in the history
  • Loading branch information
ca-nguyen committed Oct 19, 2021
1 parent 23878de commit 5991cf8
Show file tree
Hide file tree
Showing 11 changed files with 977 additions and 93 deletions.
51 changes: 51 additions & 0 deletions .github/ISSUE_TEMPLATE/bug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
name: "\U0001F41B Bug Report"
about: Report a bug
title: "short issue description"
labels: bug, needs-triage
---

<!--
description of the bug:
-->




### Reproduction Steps

<!--
minimal amount of code that causes the bug (if possible) or a reference.
The code sample should be an SSCCE. See http://sscce.org/ for details.
In short, provide a code sample that we can copy/paste, run and reproduce.
-->

### What did you expect to happen?

<!--
What were you trying to achieve by performing the steps above?
-->

### What actually happened?

<!--
What is the unexpected behavior you were seeing? If you got an error, paste it here.
-->


### Environment

- **AWS Step Functions Data Science Python SDK version :**
- **Python Version:** <!-- Version of Python (run the command `python3 --version`) -->

### Other

<!-- e.g. detailed explanation, stack-traces, related issues, suggestions on how to fix, links for us to have context, eg. associated pull-request, stackoverflow, slack, etc -->




---

This is :bug: Bug Report
28 changes: 28 additions & 0 deletions .github/ISSUE_TEMPLATE/doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: "📕 Documentation Issue"
about: Issue in the reference documentation
title: "short issue description"
labels: feature-request, documentation, needs-triage
---

<!--
- want to help? submit a pull request! docs can be found here: https://github.com/aws/aws-step-functions-data-science-sdk-python/tree/main/doc
-->

<!--
link to reference doc page:
-->



<!--
describe your issue:
-->





---

This is a 📕 documentation issue
46 changes: 46 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
name: "\U0001F680 Feature Request"
about: Request a new feature
title: "short issue description"
labels: feature-request, needs-triage
---

<!-- short description of the feature you are proposing: -->





### Use Case

<!-- why do you need this feature? -->





### Proposed Solution

<!-- Please include prototype/workaround/sketch/reference implementation: -->





### Other

<!--
e.g. detailed explanation, stacktraces, related issues, suggestions on how to fix,
links for us to have context, eg. associated pull-request, stackoverflow, slack, etc
-->





* [ ] :wave: I may be able to implement this feature request
* [ ] :warning: This feature might incur a breaking change

---

This is a :rocket: Feature Request
43 changes: 43 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
### Description

Please include a summary of the change being made.

Fixes #(issue)

### Why is the change necessary?

What capability does it enable? What problem does it solve?

### Solution

Please include an overview of the solution. Discuss trade-offs made, caveats, alternatives, etc.

### Testing

How was this change tested?

----

### Pull Request Checklist

Please check all boxes (including N/A items)

#### Testing

- [ ] Unit tests added
- [ ] Integration test added
- [ ] Manual testing - why was it necessary? could it be automated?

#### Documentation

- [ ] __docs__: All relevant [docs](https://github.com/aws/aws-step-functions-data-science-sdk-python/tree/main/doc) updated
- [ ] __docstrings__: All public APIs documented

### Title and description

- [ ] __Change type__: Title is prefixed with change type: and follows [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/)
- [ ] __References__: Indicate issues fixed via: `Fixes #xxx`

----

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.
66 changes: 41 additions & 25 deletions src/stepfunctions/steps/sagemaker.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,36 +185,42 @@ def __merge_hyperparameters(self, training_step_hyperparameters, estimator_hyper
merged_hyperparameters[key] = value
return merged_hyperparameters


class TransformStep(Task):

"""
Creates a Task State to execute a `SageMaker Transform Job <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>`_.
"""

def __init__(self, state_id, transformer, job_name, model_name, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, experiment_config=None, wait_for_completion=True, tags=None, input_filter=None, output_filter=None, join_source=None, **kwargs):
def __init__(self, state_id, transformer, job_name, model_name, data, data_type='S3Prefix', content_type=None,
compression_type=None, split_type=None, experiment_config=None, wait_for_completion=True, tags=None,
input_filter=None, output_filter=None, join_source=None, **kwargs):
"""
Args:
state_id (str): State name whose length **must be** less than or equal to 128 unicode characters. State names **must be** unique within the scope of the whole state machine.
transformer (sagemaker.transformer.Transformer): The SageMaker transformer to use in the TransformStep.
job_name (str or Placeholder): Specify a transform job name. We recommend to use :py:class:`~stepfunctions.inputs.ExecutionInput` placeholder collection to pass the value dynamically in each execution.
model_name (str or Placeholder): Specify a model name for the transform job to use. We recommend to use :py:class:`~stepfunctions.inputs.ExecutionInput` placeholder collection to pass the value dynamically in each execution.
data (str): Input data location in S3.
data_type (str): What the S3 location defines (default: 'S3Prefix').
data (str or Placeholder): Input data location in S3.
data_type (str or Placeholder): What the S3 location defines (default: 'S3Prefix').
Valid values:
* 'S3Prefix' - the S3 URI defines a key name prefix. All objects with this prefix will
be used as inputs for the transform job.
* 'ManifestFile' - the S3 URI points to a single manifest file listing each S3 object
to use as an input for the transform job.
content_type (str): MIME type of the input data (default: None).
compression_type (str): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None.
split_type (str): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'.
experiment_config (dict, optional): Specify the experiment config for the transform. (Default: None)
content_type (str or Placeholder): MIME type of the input data (default: None).
compression_type (str or Placeholder): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None.
split_type (str or Placeholder): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'.
experiment_config (dict or Placeholder, optional): Specify the experiment config for the transform. (Default: None)
wait_for_completion(bool, optional): Boolean value set to `True` if the Task state should wait for the transform job to complete before proceeding to the next step in the workflow. Set to `False` if the Task state should submit the transform job and proceed to the next step. (default: True)
tags (list[dict], optional): `List to tags <https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html>`_ to associate with the resource.
input_filter (str): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value ‘$’, representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the RFC format. See Supported JSONPath Operators for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.features” (default: None).
output_filter (str): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.prediction” (default: None).
join_source (str): The source of data to be joined to the transform output. It can be set to ‘Input’ meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.
tags (list[dict] or Placeholder, optional): `List to tags <https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html>`_ to associate with the resource.
input_filter (str or Placeholder): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value ‘$’, representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the RFC format. See Supported JSONPath Operators for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.features” (default: None).
output_filter (str or Placeholder): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.prediction” (default: None).
join_source (str or Placeholder): The source of data to be joined to the transform output. It can be set to ‘Input’ meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.
parameters(dict, optional): The value of this field is merged with other arguments to become the request payload for SageMaker `CreateTransformJob<https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html>`_.
You can use `parameters` to override the value provided by other arguments and specify any field's value dynamically using `Placeholders<https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/placeholders.html?highlight=placeholder#stepfunctions.inputs.Placeholder>`_.
"""
if wait_for_completion:
"""
Expand All @@ -233,7 +239,7 @@ def __init__(self, state_id, transformer, job_name, model_name, data, data_type=
SageMakerApi.CreateTransformJob)

if isinstance(job_name, str):
parameters = transform_config(
transform_parameters = transform_config(
transformer=transformer,
data=data,
data_type=data_type,
Expand All @@ -246,7 +252,7 @@ def __init__(self, state_id, transformer, job_name, model_name, data, data_type=
join_source=join_source
)
else:
parameters = transform_config(
transform_parameters = transform_config(
transformer=transformer,
data=data,
data_type=data_type,
Expand All @@ -259,17 +265,21 @@ def __init__(self, state_id, transformer, job_name, model_name, data, data_type=
)

if isinstance(job_name, Placeholder):
parameters['TransformJobName'] = job_name
transform_parameters['TransformJobName'] = job_name

parameters['ModelName'] = model_name
transform_parameters['ModelName'] = model_name

if experiment_config is not None:
parameters['ExperimentConfig'] = experiment_config
transform_parameters['ExperimentConfig'] = experiment_config

if tags:
parameters['Tags'] = tags_dict_to_kv_list(tags)
transform_parameters['Tags'] = tags if isinstance(tags, Placeholder) else tags_dict_to_kv_list(tags)

kwargs[Field.Parameters.value] = parameters
if Field.Parameters.value in kwargs and isinstance(kwargs[Field.Parameters.value], dict):
# Update transform_parameters with input parameters
merge_dicts(transform_parameters, kwargs[Field.Parameters.value])

kwargs[Field.Parameters.value] = transform_parameters
super(TransformStep, self).__init__(state_id, **kwargs)


Expand Down Expand Up @@ -444,7 +454,10 @@ def __init__(self, state_id, tuner, job_name, data, wait_for_completion=True, ta
:class:`sagemaker.amazon.amazon_estimator.RecordSet` objects,
where each instance is a different channel of training data.
wait_for_completion(bool, optional): Boolean value set to `True` if the Task state should wait for the tuning job to complete before proceeding to the next step in the workflow. Set to `False` if the Task state should submit the tuning job and proceed to the next step. (default: True)
tags (list[dict], optional): `List to tags <https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html>`_ to associate with the resource.
tags (list[dict] or Placeholder, optional): `List to tags <https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html>`_ to associate with the resource.
parameters(dict, optional): The value of this field is merged with other arguments to become the request payload for SageMaker `CreateHyperParameterTuningJob<https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html>`_.
You can use `parameters` to override the value provided by other arguments and specify any field's value dynamically using `Placeholders<https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/placeholders.html?highlight=placeholder#stepfunctions.inputs.Placeholder>`_.
"""
if wait_for_completion:
"""
Expand All @@ -462,19 +475,22 @@ def __init__(self, state_id, tuner, job_name, data, wait_for_completion=True, ta
kwargs[Field.Resource.value] = get_service_integration_arn(SAGEMAKER_SERVICE_NAME,
SageMakerApi.CreateHyperParameterTuningJob)

parameters = tuning_config(tuner=tuner, inputs=data, job_name=job_name).copy()
tuning_parameters = tuning_config(tuner=tuner, inputs=data, job_name=job_name).copy()

if job_name is not None:
parameters['HyperParameterTuningJobName'] = job_name
tuning_parameters['HyperParameterTuningJobName'] = job_name

if 'S3Operations' in parameters:
del parameters['S3Operations']
if 'S3Operations' in tuning_parameters:
del tuning_parameters['S3Operations']

if tags:
parameters['Tags'] = tags_dict_to_kv_list(tags)
tuning_parameters['Tags'] = tags if isinstance(tags, Placeholder) else tags_dict_to_kv_list(tags)

kwargs[Field.Parameters.value] = parameters
if Field.Parameters.value in kwargs and isinstance(kwargs[Field.Parameters.value], dict):
# Update tuning parameters with input parameters
merge_dicts(tuning_parameters, kwargs[Field.Parameters.value])

kwargs[Field.Parameters.value] = tuning_parameters
super(TuningStep, self).__init__(state_id, **kwargs)


Expand Down
Loading

0 comments on commit 5991cf8

Please sign in to comment.