Create AWS Glue Spark Job that copies go_daily_sales data from AWS S3 Raw bucket to AWS S3 Stage bucket.
Files for pushing to CodeComet:
- Script name:
fact_go_daily_sales_stage_t10.py
- CloudFormation template:
{user_id}_fact_go_daily_sales_stage_t10.yaml
- Path:
configurations/fact_go_daily_sales.yaml
- Use
raw-to-stage
section - Read data from
[data-source][database]
- Write data to
[data-target][s3][keys]
- Replace
{user_id}
with your aws account name for target path
- To represent Glue Flow developing locally you need to set parameters as it would on the cluster.
- You can do this with sys.argv.append command before using getResolvedOptions function from awsglue folder in your repository.
- Parameters that you will need to set:
- UserId =
{user_id}
- ConfigBucketKey =
configurations/{configuration_file_name}.yaml
- ConfigBucketName =
aws-data-engineering-course-{user-id}-assets
- Ingest config file from S3 using commons package in your repository.
- Extract data from raw S3 bucket and stage S3 bucket using DynamicFrame.
- Join Raw data and Stage data and get only new records by ingest_dt.
- Drop duplicates.
- Add column 'audit_ts' that is equal to current timestamp.
- Load Data as partitioned by 'ingest_dt' in target bucket as parquet.
- Create a Glue Spark Job on cluster with the name:
{user_id}_fact_go_daily_sales_stage_t10
. - Mandatory Steps:
- Choose Python 3.9 for Python version
- Choose 1/16 DPU for data processing units
- Choose 0 retries
- Set timeout for 5
- Set your bucket path for Script path
- Set parameters
- Set Tags Owner=aws-edu-iba-gomel and StudentName={user_id}
- Add necessary options to create a Glue Spark Job, test a Glue Spark Job and check logs in AWS CloudWatch.
- Create AWS CloudFormation template for Glue Spark Job with the name:
{user_id}_fact_go_daily_sales_stage_t10.yaml
. - Test CloudFormation Template:
- Deploy (check that your Glue Python Shell has been created using CloudFormation)
- Delete your CloudFormation Stack
- Open Pull Request as described in Common Info (Task Workflow) section.