Terrafrom module which setup Data-QA solution(bucket,Stepfunctions Pipeline with AWS Lambda, Metadata Storage. Data-QA Reports) in your infrastructure in 'one-click'. AWS Based. Built on top of Great_expectations, Pandas_profiling, Allure
Main engine based on GX to profile, generate suites and run tests
Mapping from GX format to Allure Test Report tool
Metadata and metrics aggregation
Could be used as standard Terraform module, the examples of deployments under examples
directory.
- Add to terraform DataQA module as in examples
- Add to terraform state machine
DataTests
step
resource "aws_sfn_state_machine" "data_state_machine" {
definition = jsonencode(
{
StartAt = "GetData"
States = {
GetData = {
Next = "DataTests"
Resource = aws_lambda_function.some_get_data.function_name
ResultPath = "$.file"
Type = "Task"
}
DataTests = {
Type = "Task"
Resource = "arn:aws:states:::states:startExecution.sync:2",
End = true
Parameters = {
StateMachineArn = module.data-qa.qa_step_functions_arn
Input = {
files = [
{
engine = "s3"
source_root = var.data_lake_bucket
run_name = "raw_data"
"source_data.$" = "$.file"
}
]
}
}
}
}
}
)
name = "Data-state-machine"
role_arn = aws_iam_role.state_machine.arn // role with perms on lambda:InvokeFunction
type = "STANDARD"
logging_configuration {
include_execution_data = false
level = "OFF"
}
tracing_configuration {
enabled = false
}
}
- Create AWS Serverless application* - AthenaDynamoDBConnector with parameters:
- SpillBucket - name of bucket created by terraform module
- AthenaCatalogName - The name you will give to this catalog in Athena. It will also be used as the function name.
*Cannot be created automatically by terraform because terraform-provider-aws/issues/16485
- Create AWS Athena Data Source:
- Data source type -> Amazon DynamoDB
- Connection details -> lambda function -> name of
AthenaCatalogName
from pt.3