Skip to content

Commit

Permalink
[ETL-686] Use S3 as GX validations store (#151)
Browse files Browse the repository at this point in the history
* consolidate GX docs to a single docsite

* get GX config template from S3

* add aditional tests for gx job

* Update Pipfile and Pipfile.lock
  • Loading branch information
philerooski authored Nov 5, 2024
1 parent eb7bbb8 commit 9244d8a
Show file tree
Hide file tree
Showing 15 changed files with 826 additions and 989 deletions.
7 changes: 6 additions & 1 deletion .github/workflows/cleanup.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,4 +89,9 @@ jobs:
run: pipenv run sceptre --debug --var namespace=${{ github.event.ref }} delete develop/namespaced --yes

- name: Remove artifacts
run: pipenv run python src/scripts/manage_artifacts/artifacts.py --remove --namespace ${{ github.event.ref }} --cfn_bucket ${{ vars.CFN_BUCKET }}
run: |
pipenv run python src/scripts/manage_artifacts/artifacts.py
--remove
--namespace ${{ github.event.ref }}
--cfn_bucket ${{ vars.CFN_BUCKET }}
--shareable-artifacts-bucket ${{ vars.SHAREABLE_ARTIFACTS_BUCKET }}
7 changes: 6 additions & 1 deletion .github/workflows/upload-and-deploy-to-prod-main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,12 @@ jobs:
python_version: ${{ env.PYTHON_VERSION }}

- name: Copy files to templates bucket
run: python src/scripts/manage_artifacts/artifacts.py --upload --namespace $NAMESPACE --cfn_bucket ${{ vars.CFN_BUCKET }}
run: >
python src/scripts/manage_artifacts/artifacts.py
--upload
--namespace $NAMESPACE
--cfn_bucket ${{ vars.CFN_BUCKET }}
--shareable-artifacts-bucket ${{ vars.SHAREABLE_ARTIFACTS_BUCKET }}
sceptre-deploy-main:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/upload-and-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ jobs:
--upload
--namespace $NAMESPACE
--cfn_bucket ${{ vars.CFN_BUCKET }}
--shareable-artifacts-bucket ${{ vars.SHAREABLE_ARTIFACTS_BUCKET }}
nonglue-unit-tests:
name: Runs unit tests that are not dependent on aws-glue package resources
Expand Down Expand Up @@ -437,6 +438,7 @@ jobs:
--upload
--namespace staging
--cfn_bucket ${{ vars.CFN_BUCKET }}
--shareable-artifacts-bucket ${{ vars.SHAREABLE_ARTIFACTS_BUCKET }}
- name: Create directory for remote sceptre templates
run: mkdir -p templates/remote/
Expand Down
3 changes: 2 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ python_version = "3.9"
[dev-packages]
pytest = "*"
pyarrow = "~=11.0"
pre-commit = "*"
pre-commit = "~=4.0"
sceptre = ">=3.2.0"
sceptre-sam-handler = "*"
synapseclient = "~=2.7"
numpy = "<2.0" # See issue "A module that was compiled using NumPy 1.x cannot be run..."
pandas = "<1.5"
moto = "~=4.1"
datacompy = "~=0.8"
Expand Down
849 changes: 416 additions & 433 deletions Pipfile.lock

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,16 @@ parameters:
Namespace: {{ stack_group_config.namespace }}
JobDescription: Runs great expectations on a set of data
JobRole: !stack_output_external glue-job-role::RoleArn
TempS3Bucket: {{ stack_group_config.processed_data_bucket_name }}
ParquetBucket: {{ stack_group_config.processed_data_bucket_name }}
ShareableArtifactsBucket: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
S3ScriptBucket: {{ stack_group_config.template_bucket_name }}
S3ScriptKey: '{{ stack_group_config.namespace }}/src/glue/jobs/run_great_expectations_on_parquet.py'
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
GXConfigKey: "{{ stack_group_config.namespace }}/src/glue/resources/great_expectations.yml"
GlueVersion: "{{ stack_group_config.great_expectations_job_glue_version }}"
AdditionalPythonModules: "great_expectations~=0.18,urllib3<2"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
dataset_schemas: !file src/glue/resources/table_columns.yaml
data_values_expectations: !file src/glue/resources/data_values_expectations.json
2 changes: 0 additions & 2 deletions config/develop/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ parameters:
CompareParquetMainNamespace: "main"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
ShareableArtifactsBucketName: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,16 @@ parameters:
Namespace: {{ stack_group_config.namespace }}
JobDescription: Runs great expectations on a set of data
JobRole: !stack_output_external glue-job-role::RoleArn
TempS3Bucket: {{ stack_group_config.processed_data_bucket_name }}
ParquetBucket: {{ stack_group_config.processed_data_bucket_name }}
ShareableArtifactsBucket: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
S3ScriptBucket: {{ stack_group_config.template_bucket_name }}
S3ScriptKey: '{{ stack_group_config.namespace }}/src/glue/jobs/run_great_expectations_on_parquet.py'
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
GXConfigKey: "{{ stack_group_config.namespace }}/src/glue/resources/great_expectations.yml"
GlueVersion: "{{ stack_group_config.great_expectations_job_glue_version }}"
AdditionalPythonModules: "great_expectations~=0.18,urllib3<2"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
dataset_schemas: !file src/glue/resources/table_columns.yaml
data_values_expectations: !file src/glue/resources/data_values_expectations.json
2 changes: 0 additions & 2 deletions config/prod/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ parameters:
CompareParquetMainNamespace: "main"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
ShareableArtifactsBucketName: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
Loading

0 comments on commit 9244d8a

Please sign in to comment.