-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revert "refactor(components): De-hardcoded local output paths. (#580)"
This reverts commit a77af2c.
- Loading branch information
Showing
15 changed files
with
162 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
name: Predict using TF on Dataflow | ||
description: | | ||
Runs TensorFlow prediction on Google Cloud Dataflow | ||
Input and output data is in GCS | ||
inputs: | ||
- {name: Data file pattern, type: GCSPath, description: 'GCS or local path of test file patterns.'} # type: {GCSPath: {data_type: CSV}} | ||
- {name: Schema, type: GCSPath, description: 'GCS json schema file path.'} # type: {GCSPath: {data_type: TFDV schema JSON}} | ||
- {name: Target column, type: String, description: 'Name of the column for prediction target.'} | ||
- {name: Model, type: GCSPath, description: 'GCS or local path of model trained with tft preprocessed data.'} # Models trained with estimator are exported to base/export/export/123456781 directory. # Our trainer export only one model. #TODO: Output single model from trainer # type: {GCSPath: {path_type: Directory, data_type: Exported TensorFlow models dir}} | ||
- {name: Batch size, type: Integer, default: '32', description: 'Batch size used in prediction.'} | ||
- {name: Run mode, type: String, default: local, description: 'Whether to run the job locally or in Cloud Dataflow. Valid values are "local" and "cloud".'} | ||
- {name: GCP project, type: GCPProjectID, description: 'The GCP project to run the dataflow job.'} | ||
- {name: Predictions dir, type: GCSPath, description: 'GCS or local directory.'} #Will contain prediction_results-* and schema.json files; TODO: Split outputs and replace dir with single file # type: {GCSPath: {path_type: Directory}} | ||
outputs: | ||
- {name: Predictions dir, type: GCSPath, description: 'GCS or local directory.'} #Will contain prediction_results-* and schema.json files; TODO: Split outputs and replace dir with single file # type: {GCSPath: {path_type: Directory}} | ||
- {name: MLPipeline UI metadata, type: UI metadata} | ||
implementation: | ||
container: | ||
image: gcr.io/ml-pipeline/ml-pipeline-dataflow-tf-predict:57d9f7f1cfd458e945d297957621716062d89a49 | ||
command: [python2, /ml/predict.py] | ||
args: [ | ||
--data, {inputValue: Data file pattern}, | ||
--schema, {inputValue: Schema}, | ||
--target, {inputValue: Target column}, | ||
--model, {inputValue: Model}, | ||
--mode, {inputValue: Run mode}, | ||
--project, {inputValue: GCP project}, | ||
--batchsize, {inputValue: Batch size}, | ||
--output, {inputValue: Predictions dir}, | ||
] | ||
fileOutputs: | ||
Predictions dir: /output.txt | ||
MLPipeline UI metadata: /mlpipeline-ui-metadata.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: TFX - Data Validation | ||
description: | | ||
Runs Tensorflow Data Validation. https://www.tensorflow.org/tfx/data_validation/get_started | ||
Tensorflow Data Validation (TFDV) can analyze training and serving data to: | ||
* compute descriptive statistics, | ||
* infer a schema, | ||
* detect data anomalies. | ||
inputs: | ||
- {name: Inference data, type: GCSPath, description: GCS path of the CSV file from which to infer the schema.} # type: {GCSPath: {data_type: CSV}} | ||
- {name: Validation data, type: GCSPath, description: GCS path of the CSV file whose contents should be validated.} # type: {GCSPath: {data_type: CSV}} | ||
- {name: Column names, type: GCSPath, description: GCS json file containing a list of column names.} # type: {GCSPath: {data_type: JSON}} | ||
- {name: Key columns, type: String, description: Comma separated list of columns to treat as keys.} | ||
- {name: GCP project, type: GCPProjectID, default: '', description: The GCP project to run the dataflow job.} | ||
- {name: Run mode, type: String, default: local, description: Whether to run the job locally or in Cloud Dataflow. Valid values are "local" and "cloud". } | ||
- {name: Validation output, type: GCSPath, description: GCS or local directory.} # type: {GCSPath: {path_type: Directory}} | ||
outputs: | ||
- {name: Schema, type: GCSPath, description: GCS path of the inferred schema JSON.} # type: {GCSPath: {data_type: TFDV schema JSON}} | ||
- {name: Validation result, type: String, description: Indicates whether anomalies were detected or not.} | ||
implementation: | ||
container: | ||
image: gcr.io/ml-pipeline/ml-pipeline-dataflow-tfdv:57d9f7f1cfd458e945d297957621716062d89a49 | ||
command: [python2, /ml/validate.py] | ||
args: [ | ||
--csv-data-for-inference, {inputValue: Inference data}, | ||
--csv-data-to-validate, {inputValue: Validation data}, | ||
--column-names, {inputValue: Column names}, | ||
--key-columns, {inputValue: Key columns}, | ||
--project, {inputValue: GCP project}, | ||
--mode, {inputValue: Run mode}, | ||
--output, {inputValue: Validation output}, | ||
] | ||
fileOutputs: | ||
Schema: /schema.txt | ||
Validation result: /output_validation_result.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: TFX - Analyze model | ||
description: | | ||
Runs Tensorflow Model Analysis. https://www.tensorflow.org/tfx/model_analysis/get_started | ||
TensorFlow Model Analysis allows you to perform model evaluations in the TFX pipeline, and view resultant metrics and plots in a Jupyter notebook. Specifically, it can provide: | ||
* metrics computed on entire training and holdout dataset, as well as next-day evaluations | ||
* tracking metrics over time | ||
* model quality performance on different feature slices | ||
inputs: | ||
- {name: Model, type: GCSPath, description: GCS path to the model which will be evaluated.} # type: {GCSPath: {path_type: Directory, data_type: Exported TensorFlow models dir}} | ||
- {name: Evaluation data, type: GCSPath, description: GCS path of eval files.} # type: {GCSPath: {data_type: CSV}} | ||
- {name: Schema, type: GCSPath, description: GCS json schema file path.} # type: {GCSPath: {data_type: TFDV schema JSON}} | ||
- {name: Run mode, type: String, default: local, description: whether to run the job locally or in Cloud Dataflow.} | ||
- {name: GCP project, type: GCPProjectID, default: '', description: 'The GCP project to run the dataflow job, if running in the `cloud` mode.'} | ||
- {name: Slice columns, type: String, description: Comma-separated list of columns on which to slice for analysis.} | ||
- {name: Analysis results dir, type: GCSPath, description: GCS or local directory where the analysis results should be written.} # type: {GCSPath: {path_type: Directory}} | ||
outputs: | ||
- {name: Analysis results dir, type: GCSPath, description: GCS or local directory where the analysis results should were written.} # type: {GCSPath: {path_type: Directory}} | ||
- {name: MLPipeline UI metadata, type: UI metadata} | ||
implementation: | ||
container: | ||
image: gcr.io/ml-pipeline/ml-pipeline-dataflow-tfma:57d9f7f1cfd458e945d297957621716062d89a49 | ||
command: [python2, /ml/model_analysis.py] | ||
args: [ | ||
--model, {inputValue: Model}, | ||
--eval, {inputValue: Evaluation data}, | ||
--schema, {inputValue: Schema}, | ||
--mode, {inputValue: Run mode}, | ||
--project, {inputValue: GCP project}, | ||
--slice-columns, {inputValue: Slice columns}, | ||
--output, {inputValue: Analysis results dir}, | ||
] | ||
fileOutputs: | ||
Analysis results dir: /output.txt | ||
MLPipeline UI metadata: /mlpipeline-ui-metadata.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: Transform using TF on Dataflow | ||
description: Runs TensorFlow Transform on Google Cloud Dataflow | ||
inputs: | ||
- {name: Training data file pattern, type: GCSPath, description: 'GCS path of train file patterns.'} #Also supports local CSV # type: {GCSPath: {data_type: CSV}} | ||
- {name: Evaluation data file pattern, type: GCSPath, description: 'GCS path of eval file patterns.'} #Also supports local CSV # type: {GCSPath: {data_type: CSV}} | ||
- {name: Schema, type: GCSPath, description: 'GCS json schema file path.'} # type: {GCSPath: {data_type: JSON}} | ||
- {name: GCP project, type: GCPProjectID, description: 'The GCP project to run the dataflow job.'} | ||
- {name: Run mode, type: String, default: local, description: 'Whether to run the job locally or in Cloud Dataflow. Valid values are "local" and "cloud".' } | ||
- {name: Preprocessing module, type: GCSPath, default: '', description: 'GCS path to a python file defining "preprocess" and "get_feature_columns" functions.'} # type: {GCSPath: {data_type: Python}} | ||
- {name: Transformed data dir, type: GCSPath, description: 'GCS or local directory'} #Also supports local paths # type: {GCSPath: {path_type: Directory}} | ||
outputs: | ||
- {name: Transformed data dir, type: GCSPath} # type: {GCSPath: {path_type: Directory}} | ||
implementation: | ||
container: | ||
image: gcr.io/ml-pipeline/ml-pipeline-dataflow-tft:57d9f7f1cfd458e945d297957621716062d89a49 | ||
command: [python2, /ml/transform.py] | ||
args: [ | ||
--train, {inputValue: Training data file pattern}, | ||
--eval, {inputValue: Evaluation data file pattern}, | ||
--schema, {inputValue: Schema}, | ||
--project, {inputValue: GCP project}, | ||
--mode, {inputValue: Run mode}, | ||
--preprocessing-module, {inputValue: Preprocessing module}, | ||
--output, {inputValue: Transformed data dir}, | ||
] | ||
fileOutputs: | ||
Transformed data dir: /output.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.