Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-558] Decouple S3 to JSON workflow from JSON to Parquet workflow #83

Merged
merged 1 commit into from
Oct 13, 2023

Conversation

philerooski
Copy link
Contributor

The important changes happen in templates/glue-workflow.j2. This PR splits our workflow into two workflows:

  1. S3 to JSON, which is triggered on-demand by the lambda and can run with infinite concurrency.
  2. JSON to Parquet, which has a scheduled trigger (default 2 AM UTC / 7 PM PT) and is limited to 1 run at a time. The trigger is only activated if we are deploying to the main namespace (so as to not waste resources triggering the pipeline every day on our development stacks).

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM here! Thanks for being intentional about not having the development pipelines run on a daily basis.

WorkflowName: !Ref JsonToParquetWorkflow

JsontoParquetTrigger:
Condition: !Not IsMainNamespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird... it's saying: Every Condition member must be a string.. Is that due to this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the error log this might help root out the issue and let us test as changes are made:
aws cloudformation validate-template --template-body templates/glue-workflow.j2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. CFN is not very flexible. I fixed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanFauble It seems like that command only checks if the template is valid YAML and not whether it's semantically correct? In any case, I use Jinja in the template so that command fails immediately.

@philerooski philerooski temporarily deployed to develop October 13, 2023 15:58 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 15:58 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 15:58 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 15:58 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 16:02 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 16:02 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 16:11 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop October 13, 2023 16:15 — with GitHub Actions Inactive
Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all pipelines passing the changes LGTM!

@philerooski philerooski changed the title [ETL-568] Decouple S3 to JSON workflow from JSON to Parquet workflow [ETL-558] Decouple S3 to JSON workflow from JSON to Parquet workflow Oct 13, 2023
@philerooski philerooski merged commit 70875df into main Oct 13, 2023
14 checks passed
@philerooski philerooski deleted the etl-568 branch October 13, 2023 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants