Skip to content

Commit 6f7ae86

Browse files
authored
Merge pull request #46 from vivekmittal514/release-0.3.0
Changes to support overwriting the file for multiple runs under the same folder
2 parents 8b9b950 + 4394107 commit 6f7ae86

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

spark-scripts/sample-accommodations-to-deequ.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@
2424
Add below parameters in the lambda function Environment Variables
2525
SCRIPT_BUCKET BUCKET WHERE YOU SAVE THIS SCRIPT
2626
SPARK_SCRIPT THE SCRIPT NAME AND PATH
27-
input_path s3a://redshift-downloads/spatial-data/accommodations.csv
28-
output_path THE PATH WHERE THE VERIFICATION RESULTS AND METRICS WILL BE STORED
27+
INPUT_PATH s3a://redshift-downloads/spatial-data/accommodations.csv
28+
OUTPUT_PATH THE PATH WHERE THE VERIFICATION RESULTS AND METRICS WILL BE STORED
2929
3030
Lambda General Configuration for above input file. Based on the input file size, the memory can be updated.
3131
Memory 2048 MB
@@ -42,8 +42,8 @@
4242
print("Usage: spark-dq [input-folder] [output-folder]")
4343
sys.exit(0)
4444

45-
input_path = os.environ['input_path']
46-
output_path = os.environ['output_path']
45+
input_path = os.environ['INPUT_PATH']
46+
output_path = os.environ['OUTPUT_PATH']
4747

4848

4949
aws_region = os.environ['AWS_REGION']
@@ -108,13 +108,13 @@
108108
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
109109
checkResult_df.show()
110110

111-
checkResult_df.repartition(1).write.csv(output_path+"/"+str(uuid.uuid4())+"/", sep=',')
111+
checkResult_df.repartition(1).write.mode('overwrite').csv(output_path+"/verification-results/", sep=',')
112112

113113
print("Showing VerificationResults metrics:")
114114
checkResult_df = VerificationResult.successMetricsAsDataFrame(spark, checkResult)
115115
checkResult_df.show()
116116

117-
checkResult_df.repartition(1).write.csv(output_path+"/"+str(uuid.uuid4())+"/", sep=',')
117+
checkResult_df.repartition(1).write.mode('overwrite').csv(output_path+"/verification-results-metrics/", sep=',')
118118

119119
spark.sparkContext._gateway.shutdown_callback_server()
120120
spark.stop()

0 commit comments

Comments
 (0)