diff --git a/LICENSE.md b/LICENSE similarity index 100% rename from LICENSE.md rename to LICENSE diff --git a/README.md b/README.md index 689261a..60da733 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Data Quality Gate ## Description -Terrafrom module which setup Data-QA solution(bucket,Stepfunctions Pipeline with AWS Lambda, Metadata Storage. Data-QA Reports) in your infrastructure in 'one-click'. AWS Based. Built on top of Great_expectations, Pandas_profiling, Allure +Terraform module which setups DataQA solution in your infrastructure in 'one-click'. AWS Based. Built on top of Great_expectations, Pandas_profiling, Allure ### Data Test Main engine based on GX to profile, generate suites and run tests @@ -15,64 +15,55 @@ Metadata and metrics aggregation ## Solution Architecture ![Preview Image](https://raw.githubusercontent.com/provectus/data-quality-gate/main/architecture.PNG) +## Supported Features + +- AWS Lambda runtime Python 3.9 +- AWS StepFunction pipeline, combining whole DataQA cycle(profiling, test generation, reporting) +- Supports Slack and Jira notifications and reporting +- AWS SNS output message bus, allowing to embed to existing data pipelines +- Web reports delivery through Nginx for companies VPN/IP set +- AWS DynamoDB and Athena integration, allowing to build AWS QuickSight or Grafana dashboards +- Flexible way of config management for underlying technologies such as Allure and GreatExpectation + ## Usage -Could be used as standard Terraform module, the examples of deployments under `examples` directory. -1. Add to terraform DataQA module as in examples -2. Add to terraform state machine `DataTests` step -```terraform -resource "aws_sfn_state_machine" "data_state_machine" { - definition = jsonencode( - { - StartAt = "GetData" - States = { - GetData = { - Next = "DataTests" - Resource = aws_lambda_function.some_get_data.function_name - ResultPath = "$.file" - Type = "Task" - } - DataTests = { - Type = "Task" - Resource = "arn:aws:states:::states:startExecution.sync:2", - End = true - Parameters = { - StateMachineArn = module.data-qa.qa_step_functions_arn - Input = { - files = [ - { - engine = "s3" - source_root = var.data_lake_bucket - run_name = "raw_data" - "source_data.$" = "$.file" - } - ] - } - } - } - } - } - ) - name = "Data-state-machine" - role_arn = aws_iam_role.state_machine.arn // role with perms on lambda:InvokeFunction - type = "STANDARD" - - logging_configuration { - include_execution_data = false - level = "OFF" - } +```hcl +module "data_qa" { + source = "github.com/provectus/data-quality-gate" - tracing_configuration { - enabled = false + data_test_storage_bucket_name = "my-data-settings-dev" + s3_source_data_bucket = "my-data-bucket" + environment = "example" + project = "my-project" + + allure_report_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-allure_report:latest" + data_test_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-data_test:latest" + push_report_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-push_reportt:latest" + + data_reports_notification_settings = { + channel = "DataReportSlackChannelName" + webhook_url = "https://hooks.slack.com/services/xxxxxxxxxxxxxxx" } + + lambda_private_subnet_ids = ["private_subnet_id"] + lambda_security_group_ids = ["security_group_id"] + + reports_vpc_id = "some_vpc_id" + reports_subnet_id = "subnet_id" + reports_whitelist_ips = ["0.0.0.0/0"] } ``` -3. Create AWS Serverless application* - [AthenaDynamoDBConnector](https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/create/app?applicationId=arn:aws:serverlessrepo:us-east-1:292517598671:applications/AthenaDynamoDBConnector) with parameters: - - SpillBucket - name of bucket created by terraform module - - AthenaCatalogName - The name you will give to this catalog in Athena. It will also be used as the function name. -*Cannot be created automatically by terraform because [terraform-provider-aws/issues/16485](https://github.com/hashicorp/terraform-provider-aws/issues/16485) +## Examples + +Could be used as standard Terraform module, the examples of deployments under `examples` directory. + +- [data-qa-basic](https://github.com/provectus/data-quality-gate/tree/main/examples/basic) - Creates DataQA module which builds AWS infrastructure. + +## Local Development and Testing + +See the [functions](https://github.com/provectus/data-quality-gate/tree/main/functions) for further details. + +## License -4. Create AWS Athena Data Source: -- Data source type -> Amazon DynamoDB -- Connection details -> lambda function -> name of `AthenaCatalogName` from pt.3 +Apache 2 Licensed. See [LICENSE](https://github.com/provectus/data-quality-gate/tree/main/LICENSE) for full details. diff --git a/docs/inframap.png b/docs/inframap.png new file mode 100644 index 0000000..a78902d Binary files /dev/null and b/docs/inframap.png differ diff --git a/examples/basic/README.md b/examples/basic/README.md index e69de29..6e96964 100644 --- a/examples/basic/README.md +++ b/examples/basic/README.md @@ -0,0 +1,49 @@ +Basic Data QA example +======================== + +Configuration in this directory shows how to instantiate a Data QA module that consists from various AWS services. + +Note, this example does not contain required high-level aws global infrastructure such as vpc and networking. To see module requirements go to [README](https://github.com/provectus/data-quality-gate/tree/main/terraform/README.md) + +Usage +===== + +To run this example you need to execute: + +```bash +$ terraform init +$ terraform plan +$ terraform apply +``` + +Note that this example may create resources which can cost money (AWS EC2 instance, for example). Run `terraform destroy` when you don't need these resources. + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | ~> 1.1 | +| [aws](#requirement\_aws) | ~> 4.64.0 | + +## Providers + +No providers. + +## Modules + +| Name | Source | Version | +|------|--------|---------| +| [data\_qa](#module\_data\_qa) | ../../terraform | n/a | + +## Resources + +No resources. + +## Inputs + +No inputs. + +## Outputs + +No outputs. + diff --git a/examples/basic/versions.tf b/examples/basic/versions.tf index 9eb52cd..ee05458 100644 --- a/examples/basic/versions.tf +++ b/examples/basic/versions.tf @@ -1,10 +1,10 @@ terraform { - required_version = ">= 1.1.7" - required_providers { aws = { source = "hashicorp/aws" version = "~> 4.64.0" } } + + required_version = "~> 1.1" } diff --git a/terraform/README.md b/terraform/README.md index 1b123ed..4046ae4 100644 --- a/terraform/README.md +++ b/terraform/README.md @@ -1,26 +1,57 @@ +## DataQA terraform module + +![Preview Image](https://raw.githubusercontent.com/provectus/data-quality-gate/main/docs/inframap.png) + +### Pre-requirements + +As part of this solution, it is expected to have the necessary existing infrastructure +- At least 1 Vpc +- At least 1 private subnet in vpc +- At least 1 public subnet in vpc(if you want to see DataQA reports in the Web) +- At least 5 vpc endpoints + - `com.amazonaws.AWS-REGION.dynamodb` + - `com.amazonaws.AWS-REGION.s3` + - `com.amazonaws.AWS-REGION.sns` + - `com.amazonaws.AWS-REGION.monitoring` + - `com.amazonaws.AWS-REGION.secretsmanager` +- At least 1 AWS S3 bucket with data that you want to test +- At least 1 AWS ECR repository + +### List of submodules + +- [Alerting](https://github.com/provectus/data-quality-gate/tree/main/terraform/modules/alerting) - provides basic functionality for AWS CloudWatch metrics alerts and forwards them to the Slack messenger. Also used as message bus for `data_report` lambda +- [Athena connector](https://github.com/provectus/data-quality-gate/tree/main/terraform/modules/athena-connector) - builds AWS Athena data catalog and AWS Lambda to allow query internal DynamoDB data table +- [AWS S3 configs](https://github.com/provectus/data-quality-gate/tree/main/terraform/modules/s3-configs) - creates internal AWS S3 bucket for data quality processing. Additionally pushing Allure and GreatExpectations configs to this bucket +- [AWS S3 Gateway](https://github.com/provectus/data-quality-gate/tree/main/terraform/modules/s3-gateway) - creates AWS EC2 instance that serves HTTP requests to see static reports in the web. + ## Requirements | Name | Version | |------|---------| | [terraform](#requirement\_terraform) | ~> 1.1 | -| [aws](#requirement\_aws) | >= 4.8.0 | +| [aws](#requirement\_aws) | ~> 4.64.0 | | [local](#requirement\_local) | ~> 2.2.3 | +| [null](#requirement\_null) | ~> 3.2.1 | ## Providers | Name | Version | |------|---------| -| [aws](#provider\_aws) | >= 4.8.0 | +| [aws](#provider\_aws) | 4.64.0 | ## Modules | Name | Source | Version | |------|--------|---------| -| [lambda\_function\_allure\_report](#module\_lambda\_function\_allure\_report) | terraform-aws-modules/lambda/aws | 3.3.1 | -| [lambda\_function\_data\_test](#module\_lambda\_function\_data\_test) | terraform-aws-modules/lambda/aws | 3.3.1 | -| [lambda\_function\_push\_report](#module\_lambda\_function\_push\_report) | terraform-aws-modules/lambda/aws | 3.3.1 | -| [slack\_notifier](#module\_slack\_notifier) | ./modules/slack-notification | n/a | +| [athena\_connector](#module\_athena\_connector) | ./modules/athena-connector | n/a | +| [basic\_slack\_alerting](#module\_basic\_slack\_alerting) | ./modules/alerting | n/a | +| [data\_reports\_alerting](#module\_data\_reports\_alerting) | ./modules/alerting | n/a | +| [lambda\_allure\_report](#module\_lambda\_allure\_report) | terraform-aws-modules/lambda/aws | 3.3.1 | +| [lambda\_data\_test](#module\_lambda\_data\_test) | terraform-aws-modules/lambda/aws | 3.3.1 | +| [lambda\_push\_report](#module\_lambda\_push\_report) | terraform-aws-modules/lambda/aws | 3.3.1 | +| [reports\_gateway](#module\_reports\_gateway) | ./modules/s3-gateway | n/a | +| [s3\_bucket](#module\_s3\_bucket) | ./modules/s3-configs | n/a | ## Resources @@ -28,96 +59,77 @@ |------|------| | [aws_appautoscaling_policy.data_qa_report_read_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy) | resource | | [aws_appautoscaling_policy.data_qa_report_table_write_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy) | resource | -| [aws_appautoscaling_target.data_qa_report_table_read_target](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_target) | resource | -| [aws_appautoscaling_target.data_qa_report_table_write_target](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_target) | resource | -| [aws_cloudfront_distribution.s3_distribution_ip](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution) | resource | -| [aws_cloudfront_origin_access_identity.data_qa_oai](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_origin_access_identity) | resource | -| [aws_cloudfront_origin_access_identity.never_be_reached](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_origin_access_identity) | resource | -| [aws_cloudwatch_log_group.state-machine-log-group](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group) | resource | -| [aws_cloudwatch_metric_alarm.lambda_allure_report_error](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource | -| [aws_cloudwatch_metric_alarm.lambda_data_test_error](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource | -| [aws_cloudwatch_metric_alarm.lambda_push_report_error](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource | +| [aws_appautoscaling_target.data_qa_report_table_read](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_target) | resource | +| [aws_appautoscaling_target.data_qa_report_table_write](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_target) | resource | +| [aws_cloudwatch_log_group.state_machine_log_group](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group) | resource | | [aws_dynamodb_table.data_qa_report](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table) | resource | -| [aws_iam_policy.CloudWatchLogsDeliveryFullAccessPolicy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.LambdaInvokeScopedAccessPolicy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.XRayAccessPolicy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.athena](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | | [aws_iam_policy.basic_lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.data_test_athena](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.cloud_watch_logs_delivery_full_access_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.dynamodb](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.lambda_invoke_scoped_access_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.sns](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_policy.xray_access_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | | [aws_iam_role.step_functions_fast_data_qa](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | | [aws_iam_role_policy_attachment.data_test_athena](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | -| [aws_s3_bucket.settings_bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket) | resource | -| [aws_s3_bucket_lifecycle_configuration.delete_old_reports](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_lifecycle_configuration) | resource | -| [aws_s3_bucket_policy.cloudfront_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_policy) | resource | -| [aws_s3_bucket_public_access_block.public_access_block_fast_data_qa](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_public_access_block) | resource | -| [aws_s3_bucket_versioning.fast-data-qa-bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_versioning) | resource | -| [aws_s3_object.expectations_store](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.great_expectations_yml](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.mapping_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.pipeline_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.pks_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.sort_keys_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.test_config_manifest](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | -| [aws_s3_object.test_configs](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_iam_role_policy_attachment.push_report_dynamodb](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | +| [aws_iam_role_policy_attachment.push_report_sns](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | [aws_sfn_state_machine.fast_data_qa](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sfn_state_machine) | resource | -| [aws_sns_topic.notifications](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic) | resource | -| [aws_sns_topic_policy.notification](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_policy) | resource | -| [aws_wafv2_ip_set.vpn_ipset](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/wafv2_ip_set) | resource | -| [aws_wafv2_web_acl.waf_acl](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/wafv2_web_acl) | resource | -| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source | | [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source | -| [aws_iam_policy_document.s3_policy_for_cloudfront](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | -| [aws_iam_policy_document.slack_notification_sns](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source | ## Inputs | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| +| [allure\_report\_extra\_vars](#input\_allure\_report\_extra\_vars) | Extra environment variables for allure report lambda | `map(string)` | `{}` | no | | [allure\_report\_image\_uri](#input\_allure\_report\_image\_uri) | Allure report image URI(ECR repository) | `string` | n/a | yes | -| [cloudfront\_allowed\_subnets](#input\_cloudfront\_allowed\_subnets) | list of allowed subnets allows users to get reports from specific IP address spaces | `list(string)` | `null` | no | -| [cloudfront\_location\_restrictions](#input\_cloudfront\_location\_restrictions) | List of regions allowed for CloudFront distribution | `list(string)` |
[
"US",
"CA",
"GB",
"DE",
"TR"
]
| no | -| [create\_cloudwatch\_notifications\_topic](#input\_create\_cloudwatch\_notifications\_topic) | Should sns topic for cloudwatch alerts be created | `bool` | `true` | no | +| [basic\_alert\_notification\_settings](#input\_basic\_alert\_notification\_settings) | Base alert notifications settings. If empty - basic alerts will be disabled |
object({
channel = string
webhook_url = string
})
| `null` | no | +| [data\_reports\_notification\_settings](#input\_data\_reports\_notification\_settings) | Data reports notifications settings. If empty - notifications will be disabled |
object({
channel = string
webhook_url = string
})
| `null` | no | +| [data\_test\_extra\_vars](#input\_data\_test\_extra\_vars) | Extra environment variables for data test lambda | `map(string)` | `{}` | no | | [data\_test\_image\_uri](#input\_data\_test\_image\_uri) | Data test image URI(ECR repository) | `string` | n/a | yes | | [data\_test\_storage\_bucket\_name](#input\_data\_test\_storage\_bucket\_name) | Bucket name which will be used to store data tests and settings for it's execution | `string` | n/a | yes | +| [dynamodb\_autoscaling\_defaults](#input\_dynamodb\_autoscaling\_defaults) | A map of default autoscaling settings | `map(string)` |
{
"scale_in_cooldown": 50,
"scale_out_cooldown": 40,
"target_value": 45
}
| no | +| [dynamodb\_autoscaling\_read](#input\_dynamodb\_autoscaling\_read) | A map of read autoscaling settings. `max_capacity` is the only required key. | `map(string)` |
{
"max_capacity": 200
}
| no | +| [dynamodb\_autoscaling\_write](#input\_dynamodb\_autoscaling\_write) | A map of write autoscaling settings. `max_capacity` is the only required key. | `map(string)` |
{
"max_capacity": 10
}
| no | +| [dynamodb\_hash\_key](#input\_dynamodb\_hash\_key) | The attribute to use as the hash (partition) key. Must also be defined as an attribute | `string` | `"file"` | no | | [dynamodb\_read\_capacity](#input\_dynamodb\_read\_capacity) | Dynamodb report table read capacity | `number` | `20` | no | -| [dynamodb\_report\_table\_autoscaling\_read\_capacity\_settings](#input\_dynamodb\_report\_table\_autoscaling\_read\_capacity\_settings) | Report table autoscaling read capacity |
object({
min = number
max = number
})
|
{
"max": 200,
"min": 50
}
| no | -| [dynamodb\_report\_table\_autoscaling\_write\_capacity\_settings](#input\_dynamodb\_report\_table\_autoscaling\_write\_capacity\_settings) | Report table autoscaling write capacity |
object({
min = number
max = number
})
|
{
"max": 50,
"min": 2
}
| no | -| [dynamodb\_report\_table\_read\_scale\_threshold](#input\_dynamodb\_report\_table\_read\_scale\_threshold) | Dynamodb report table read scale up threshold | `number` | `60` | no | -| [dynamodb\_report\_table\_write\_scale\_threshold](#input\_dynamodb\_report\_table\_write\_scale\_threshold) | Dynamodb report table write scale up threshold | `number` | `70` | no | | [dynamodb\_stream\_enabled](#input\_dynamodb\_stream\_enabled) | Dynamodb report table stream enabled | `bool` | `false` | no | -| [dynamodb\_table\_attributes](#input\_dynamodb\_table\_attributes) | List of nested attribute definitions. Only required for hash\_key and range\_key attributes. Each attribute has two properties: name - (Required) The name of the attribute, type - (Required) Attribute type, which must be a scalar type: S, N, or B for (S)tring, (N)umber or (B)inary data | `list(map(string))` | `[]` | no | +| [dynamodb\_table\_attributes](#input\_dynamodb\_table\_attributes) | List of nested attribute definitions. Only required for hash\_key and range\_key attributes. Each attribute has two properties: name - (Required) The name of the attribute, type - (Required) Attribute type, which must be a scalar type: S, N, or B for (S)tring, (N)umber or (B)inary data | `list(map(string))` |
[
{
"name": "file",
"type": "S"
}
]
| no | | [dynamodb\_write\_capacity](#input\_dynamodb\_write\_capacity) | Dynamodb report table write capacity | `number` | `2` | no | -| [environment](#input\_environment) | Environment name used to build fully qualified tags and resource's names | `string` | `"data-qa-dev"` | no | -| [expectations\_store](#input\_expectations\_store) | Path to the expectations\_store directory, relative to the root TF | `string` | `"../expectations_store"` | no | +| [environment](#input\_environment) | Environment name used to build fully qualified tags and resource's names | `string` | n/a | yes | +| [expectations\_store](#input\_expectations\_store) | Path to the expectations\_store directory, relative to the root TF | `string` | `"../../../expectations_store"` | no | +| [great\_expectation\_path](#input\_great\_expectation\_path) | Path to the great expectations yaml | `string` | `"../../../templates/great_expectations.yml"` | no | | [lambda\_allure\_report\_memory](#input\_lambda\_allure\_report\_memory) | Amount of memory allocated to the lambda function lambda\_allure\_report | `number` | `1024` | no | | [lambda\_data\_test\_memory](#input\_lambda\_data\_test\_memory) | Amount of memory allocated to the lambda function lambda\_data\_test | `number` | `5048` | no | +| [lambda\_private\_subnet\_ids](#input\_lambda\_private\_subnet\_ids) | List of private subnets assigned to lambda | `list(string)` | n/a | yes | | [lambda\_push\_jira\_url](#input\_lambda\_push\_jira\_url) | Lambda function push report env variable JIRA\_URL | `string` | `null` | no | | [lambda\_push\_report\_memory](#input\_lambda\_push\_report\_memory) | Amount of memory allocated to the lambda function lambda\_push\_report | `number` | `1024` | no | | [lambda\_push\_secret\_name](#input\_lambda\_push\_secret\_name) | Lambda function push report env variable JIRA\_URL | `string` | `null` | no | -| [mapping\_path](#input\_mapping\_path) | Path to the mapping description path, relative to the root TF | `string` | `"../configs/mapping.json"` | no | -| [pipeline\_config\_path](#input\_pipeline\_config\_path) | Path to the pipeline description path, relative to the root TF | `string` | `"../configs/pipeline.json"` | no | -| [pks\_path](#input\_pks\_path) | Path to the primary keys description path, relative to the root TF | `string` | `"../configs/pks.json"` | no | +| [lambda\_security\_group\_ids](#input\_lambda\_security\_group\_ids) | List of security group assigned to lambda | `list(string)` | n/a | yes | +| [manifest\_path](#input\_manifest\_path) | Path to the manifests | `string` | `"../../../configs/manifest.json"` | no | +| [mapping\_path](#input\_mapping\_path) | Path to the mapping description path, relative to the root TF | `string` | `"../../../configs/mapping.json"` | no | +| [pipeline\_config\_path](#input\_pipeline\_config\_path) | Path to the pipeline description path, relative to the root TF | `string` | `"../../../configs/pipeline.json"` | no | +| [pks\_path](#input\_pks\_path) | Path to the primary keys description path, relative to the root TF | `string` | `"../../../configs/pks.json"` | no | | [project](#input\_project) | Project name used to build fully qualified tags and resource's names | `string` | `"demo"` | no | +| [push\_report\_extra\_vars](#input\_push\_report\_extra\_vars) | Extra environment variables for push report lambda | `map(string)` | `{}` | no | | [push\_report\_image\_uri](#input\_push\_report\_image\_uri) | Push report image URI(ECR repository) | `string` | n/a | yes | | [redshift\_db\_name](#input\_redshift\_db\_name) | Database name for source redshift cluster | `string` | `null` | no | | [redshift\_secret](#input\_redshift\_secret) | Secret name from AWS SecretsManager for Redshift cluster | `string` | `null` | no | -| [slack\_settings](#input\_slack\_settings) | Slack notifications settings. If null - slack notifications will be disabled |
object({
webhook_url = string
channel = string
username = string
image_uri = string
vpc_id = string
})
| `null` | no | -| [sns\_cloudwatch\_notifications\_topic\_arn](#input\_sns\_cloudwatch\_notifications\_topic\_arn) | SNS topic to send cloudwatch events | `string` | `null` | no | -| [sort\_keys\_path](#input\_sort\_keys\_path) | Path to the sort keys description path, relative to the root TF | `string` | `"../configs/sort_keys.json"` | no | -| [test\_coverage\_path](#input\_test\_coverage\_path) | Path to the tests description path, relative to the root TF | `string` | `"../configs/test_coverage.json"` | no | -| [vpc\_security\_group\_ids](#input\_vpc\_security\_group\_ids) | List of security group assigned to lambda. If null value, default subnet and vpc will be used | `list(string)` | `null` | no | -| [vpc\_subnet\_ids](#input\_vpc\_subnet\_ids) | List of subnet ids to place lambda in. If null value, default subnet and vpc will be used | `list(string)` | `null` | no | +| [reports\_subnet\_id](#input\_reports\_subnet\_id) | Subnet id where gateway instance will be placed | `string` | n/a | yes | +| [reports\_vpc\_id](#input\_reports\_vpc\_id) | Vpc Id where gateway instance will be placed | `string` | n/a | yes | +| [reports\_whitelist\_ips](#input\_reports\_whitelist\_ips) | List of allowed IPs to see reports | `list(string)` | n/a | yes | +| [s3\_source\_data\_bucket](#input\_s3\_source\_data\_bucket) | Bucket name, with the data on which test will be executed | `string` | n/a | yes | +| [sort\_keys\_path](#input\_sort\_keys\_path) | Path to the sort keys description path, relative to the root TF | `string` | `"../../../configs/sort_keys.json"` | no | +| [test\_coverage\_path](#input\_test\_coverage\_path) | Path to the tests description path, relative to the root TF | `string` | `"../../../configs/test_coverage.json"` | no | ## Outputs | Name | Description | |------|-------------| -| [allure\_report\_role\_arn](#output\_allure\_report\_role\_arn) | n/a | -| [bucket](#output\_bucket) | Data quality gate bucket with settings and generated tests | -| [data\_test\_role\_arn](#output\_data\_test\_role\_arn) | n/a | -| [lambda\_allure\_arn](#output\_lambda\_allure\_arn) | n/a | -| [lambda\_data\_test\_arn](#output\_lambda\_data\_test\_arn) | n/a | -| [lambda\_report\_push\_arn](#output\_lambda\_report\_push\_arn) | n/a | -| [report\_push\_role\_arn](#output\_report\_push\_role\_arn) | n/a | -| [step\_function\_arn](#output\_step\_function\_arn) | n/a | +| [bucket](#output\_bucket) | DataQA bucket with settings and generated tests | +| [lambda\_allure\_arn](#output\_lambda\_allure\_arn) | Allure reports generation lambda arn | +| [lambda\_data\_test\_arn](#output\_lambda\_data\_test\_arn) | Data test generation/running lambda arn | +| [lambda\_report\_push\_arn](#output\_lambda\_report\_push\_arn) | Report push to dynamodb lambda arn | +| [step\_function\_arn](#output\_step\_function\_arn) | DataQA step function arn | diff --git a/terraform/modules/alerting/README.md b/terraform/modules/alerting/README.md new file mode 100644 index 0000000..06ece1b --- /dev/null +++ b/terraform/modules/alerting/README.md @@ -0,0 +1,58 @@ +Alerting +======================= + +The module in this folder used for 2 purposes: +- Creating CloudWatch metric alarms for DataQA main AWS StepFunction and forward with alerts to the Slack channel +- Creating data reports message bus that receives custom metrics from `report_push` lambda and forwards them to the Slack channel + + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | ~> 1.1 | +| [aws](#requirement\_aws) | ~> 4.64.0 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | 5.5.0 | + +## Modules + +| Name | Source | Version | +|------|--------|---------| +| [slack\_notification](#module\_slack\_notification) | terraform-aws-modules/notify-slack/aws | 6.0.0 | + +## Resources + +| Name | Type | +|------|------| +| [aws_cloudwatch_metric_alarm.alarm](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource | +| [aws_kms_ciphertext.slack_url](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_ciphertext) | resource | +| [aws_kms_key.slack](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_key) | resource | +| [aws_sfn_state_machine.step_functions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/sfn_state_machine) | data source | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [datapoints\_to\_alarm](#input\_datapoints\_to\_alarm) | The number of datapoints that must be breaching to trigger the alarm. | `number` | `1` | no | +| [evaluation\_periods](#input\_evaluation\_periods) | The number of periods over which data is compared to the specified threshold. | `number` | `1` | no | +| [lambda\_function\_vpc\_security\_group\_ids](#input\_lambda\_function\_vpc\_security\_group\_ids) | List of security group ids when Lambda Function should run in the VPC. | `list(string)` | `null` | no | +| [lambda\_function\_vpc\_subnet\_ids](#input\_lambda\_function\_vpc\_subnet\_ids) | List of subnet ids when Lambda Function should run in the VPC. Usually private or intra subnets. | `list(string)` | `null` | no | +| [period](#input\_period) | The period in seconds over which the specified statistic is applied. | `number` | `60` | no | +| [resource\_name\_prefix](#input\_resource\_name\_prefix) | Resource name prefix used to generate resources | `string` | n/a | yes | +| [slack\_channel](#input\_slack\_channel) | Slack channel to send notifications | `string` | n/a | yes | +| [slack\_sns\_topic\_name](#input\_slack\_sns\_topic\_name) | Sns topic name to forward notifications to | `string` | n/a | yes | +| [slack\_username](#input\_slack\_username) | Slack username which will be used as author of notifications | `string` | n/a | yes | +| [slack\_webhook\_url](#input\_slack\_webhook\_url) | Slack webhook url in form https://hooks.slack.com/services/........ | `string` | n/a | yes | +| [step\_functions\_to\_monitor](#input\_step\_functions\_to\_monitor) | List of step functions for which to create cloudwatch metrics alarm | `set(string)` | `[]` | no | + +## Outputs + +| Name | Description | +|------|-------------| +| [sns\_topic\_arn](#output\_sns\_topic\_arn) | Notifications topic arn | + diff --git a/terraform/modules/athena-connector/README.md b/terraform/modules/athena-connector/README.md new file mode 100644 index 0000000..2b15bc1 --- /dev/null +++ b/terraform/modules/athena-connector/README.md @@ -0,0 +1,55 @@ +AWS Athena connector +======================= + +The module in this folder creates AWS Athena DataCatalog and AWS Lambda function that serves requests from AWS Athena to AWS DynamoDB. +It uses official AWS DynamoDB [connector](https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html). + + + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | ~> 1.1 | +| [aws](#requirement\_aws) | ~> 4.64.0 | +| [null](#requirement\_null) | ~> 3.2.1 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | 5.5.0 | +| [null](#provider\_null) | 3.2.1 | + +## Modules + +No modules. + +## Resources + +| Name | Type | +|------|------| +| [aws_iam_policy.athena_connector_lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_role.athena_connector_lambda_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | +| [aws_iam_role_policy_attachment.athena_connector_basic_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | +| [aws_lambda_function.athena_dynamodb_connector](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource | +| [aws_s3_bucket.athena_spill_bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket) | resource | +| [aws_s3_bucket_public_access_block.public_access_block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_public_access_block) | resource | +| [aws_s3_bucket_versioning.athena_spill_bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_versioning) | resource | +| [null_resource.athena_dynamodb_connector](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource | +| [null_resource.delete_athena_dynamodb_connector](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [data\_catalog\_name](#input\_data\_catalog\_name) | Name of athena data catalog | `string` | n/a | yes | +| [delete\_athena\_dynamodb\_connector](#input\_delete\_athena\_dynamodb\_connector) | Set to True to delete athena dynamodb connector | `bool` | `false` | no | +| [primary\_aws\_region](#input\_primary\_aws\_region) | AWS region | `string` | n/a | yes | +| [vpc\_security\_group\_ids](#input\_vpc\_security\_group\_ids) | List of security group assigned to lambda. If null value, default subnet and vpc will be used | `list(string)` | `null` | no | +| [vpc\_subnet\_ids](#input\_vpc\_subnet\_ids) | List of subnet ids to place lambda in. If null value, default subnet and vpc will be used | `list(string)` | `null` | no | + +## Outputs + +No outputs. + diff --git a/terraform/modules/s3-configs/README.md b/terraform/modules/s3-configs/README.md new file mode 100644 index 0000000..73cad82 --- /dev/null +++ b/terraform/modules/s3-configs/README.md @@ -0,0 +1,61 @@ +AWS S3 bucket and configs +======================= + +The Terraform module in this folder is responsible for creating an AWS S3 bucket that used by DataQA as a basic bucket to store configs and generated tests into it. + + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | ~> 1.1 | +| [aws](#requirement\_aws) | ~> 4.64.0 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | 5.5.0 | + +## Modules + +No modules. + +## Resources + +| Name | Type | +|------|------| +| [aws_s3_bucket.settings_bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket) | resource | +| [aws_s3_bucket_lifecycle_configuration.delete_old_reports](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_lifecycle_configuration) | resource | +| [aws_s3_bucket_public_access_block.settings_bucket_public_access_block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_public_access_block) | resource | +| [aws_s3_bucket_versioning.settings_bucket_versioning](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_versioning) | resource | +| [aws_s3_object.expectations_store](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.great_expectations_yml](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.mapping_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.pipeline_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.pks_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.sort_keys_config](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.test_config_manifest](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | +| [aws_s3_object.test_configs](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [data\_test\_storage\_bucket\_name](#input\_data\_test\_storage\_bucket\_name) | Bucket name which will be used to store data tests and settings for it's execution | `string` | n/a | yes | +| [environment](#input\_environment) | Environment name used to build fully qualified tags and resource's names | `string` | n/a | yes | +| [expectations\_store](#input\_expectations\_store) | Path to the expectations\_store directory, relative to the root TF | `string` | n/a | yes | +| [great\_expectation\_path](#input\_great\_expectation\_path) | Path to the great expectations yaml | `string` | n/a | yes | +| [manifest\_path](#input\_manifest\_path) | Path to the manifests | `string` | n/a | yes | +| [mapping\_path](#input\_mapping\_path) | Path to the mapping description path, relative to the root TF | `string` | n/a | yes | +| [pipeline\_config\_path](#input\_pipeline\_config\_path) | Path to the pipeline description path, relative to the root TF | `string` | n/a | yes | +| [pks\_path](#input\_pks\_path) | Path to the primary keys description path, relative to the root TF | `string` | n/a | yes | +| [sort\_keys\_path](#input\_sort\_keys\_path) | Path to the sort keys description path, relative to the root TF | `string` | n/a | yes | +| [test\_coverage\_path](#input\_test\_coverage\_path) | Path to the tests description path, relative to the root TF | `string` | n/a | yes | + +## Outputs + +| Name | Description | +|------|-------------| +| [bucket\_name](#output\_bucket\_name) | Name of s3 configs bucket | + diff --git a/terraform/modules/s3-gateway/README.md b/terraform/modules/s3-gateway/README.md new file mode 100644 index 0000000..09f20fc --- /dev/null +++ b/terraform/modules/s3-gateway/README.md @@ -0,0 +1,56 @@ +Nginx AWS S3 gateway +======================== + +The Terraform module in this folder is responsible for creating an Nginx AWS S3 gateway that allows serving static reports from AWS S3 over HTTP and applies IP restrictions. + +Underneath, it creates an AWS EC2 instance in a public subnet and installs Nginx with the s3-gateway module. IP restrictions are implemented as rules for security group ingress and set by the `whitelist_ips` variable. + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | ~> 1.1 | +| [aws](#requirement\_aws) | ~> 4.64.0 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | 5.5.0 | + +## Modules + +No modules. + +## Resources + +| Name | Type | +|------|------| +| [aws_iam_instance_profile.web_instance_profile](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource | +| [aws_iam_policy.s3_read](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_role.instance_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | +| [aws_iam_role_policy_attachment.push_report_dynamodb](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | +| [aws_instance.s3_gateway](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance) | resource | +| [aws_security_group.connectable](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource | +| [aws_ami.ubuntu](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source | +| [aws_iam_policy_document.assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | +| [aws_s3_bucket.data_bucket](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/s3_bucket) | data source | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [bucket\_name](#input\_bucket\_name) | Bucket name to serve by gateway(read-only) | `string` | n/a | yes | +| [env](#input\_env) | Env tag used to tag resources | `string` | n/a | yes | +| [instance\_sg\_ids](#input\_instance\_sg\_ids) | Extra list of security groups for instance | `list(string)` | `[]` | no | +| [instance\_subnet\_id](#input\_instance\_subnet\_id) | Instance subnet id | `string` | n/a | yes | +| [instance\_type](#input\_instance\_type) | Instance type for s3 gateway | `string` | `"t2.micro"` | no | +| [vpc\_id](#input\_vpc\_id) | VpcId for s3 gateway | `string` | n/a | yes | +| [whitelist\_ips](#input\_whitelist\_ips) | Allowed IPs to ssh/http to host | `list(string)` | n/a | yes | + +## Outputs + +| Name | Description | +|------|-------------| +| [s3\_gateway\_address](#output\_s3\_gateway\_address) | DNS http address of s3 gateway | + diff --git a/terraform/outputs.tf b/terraform/outputs.tf index 99e4a7c..11dbdf6 100644 --- a/terraform/outputs.tf +++ b/terraform/outputs.tf @@ -19,6 +19,6 @@ output "lambda_report_push_arn" { } output "bucket" { - description = "Data quality gate bucket with settings and generated tests" + description = "DataQA bucket with settings and generated tests" value = module.s3_bucket.bucket_name }