[T15.00] Glue Spark: transforming and processing data

Documentation:

Create AWS Glue Spark Job that copies go_daily_sales data from AWS S3 Raw bucket to AWS S3 Stage bucket.

Files for pushing to CodeComet:

To represent Glue Flow developing locally you need to set parameters as it would on the cluster.
You can do this with sys.argv.append command before using getResolvedOptions function from awsglue folder in your repository.
Parameters that you will need to set:

Create a Glue Spark Job on cluster with the name: {user_id}_fact_go_daily_sales_stage_t10.
Mandatory Steps:
1. Choose Python 3.9 for Python version
2. Choose 1/16 DPU for data processing units
3. Choose 0 retries
4. Set timeout for 5
5. Set your bucket path for Script path
6. Set parameters
7. Set Tags Owner=aws-edu-iba-gomel and StudentName={user_id}
Add necessary options to create a Glue Spark Job, test a Glue Spark Job and check logs in AWS CloudWatch.
Create AWS CloudFormation template for Glue Spark Job with the name: {user_id}_fact_go_daily_sales_stage_t10.yaml.
Test CloudFormation Template:
1. Deploy (check that your Glue Python Shell has been created using CloudFormation)
2. Delete your CloudFormation Stack
Open Pull Request as described in Common Info (Task Workflow) section.