diff --git a/pages/architecture.md b/pages/architecture.md index c56f1e431..564e6474b 100644 --- a/pages/architecture.md +++ b/pages/architecture.md @@ -122,14 +122,33 @@ the server hosting the documentation. ### VPC facing architecture In this architecture, data.all static sites are deployed on an AWS internal application load -balancer (ALB) deployed on the VPC's private subnet. -This ALB is reachable only from Amazon VPCs and not from the internet. -Also, APIs are private and accessible only through VPC endpoints. +balancer (ALB) deployed on the VPC's private subnet. Data.all static sites are hosted on Amazon ECS using docker containers through nginx server. -Finally, data.all static sites are hosted on Amazon ECS using docker containers through nginx server. +The ALB is reachable only from Amazon VPCs and not from the internet. Also, APIs are private and accessible only through VPC endpoints. +For this kind of architecture, the following resources need to be provisioned as pre-requisite for the deployment: +- Route 53 private hosted zone +- ACM certificate +- For the above you will also need a VPC which needs to be provided as input for the deployment. Check the backend VPC section to review the VPC requirements. +Although it is not a pre-requisite per se, to use this architecture customers need a way to connect with the data.all VPC. Typically, +this is achieved by connecting the VPN to the VPC in data.all. +With the following commands you can create the ACM certificate and Route 53 private hosted zone: +1. `cd` to empty directory +2. This command will create your pem and a paraphrase password file: `openssl req -x509 -newkey rsa:4096 -days 1825 -keyout dataallkey.pem -out dataall.pem` +3. This command will create a no password file to load in ACM: `openssl rsa -in dataallkey.pem -out dataallkeynopwd.pem ` +4. `aws route53 create-hosted-zone --name --vpc VPCRegion=,VPCId= --caller-reference 07:12:22 --query HostedZone.Id --output text ` +5. `aws acm import-certificate --region us-east-1 --certificate fileb:// --private-key fileb:// --query CertificateArn --output text` + +After it is deployed, How do I connect (or simulate the connection) between my VPN and data.all VPC? The following +resources might be helpful for testing and connecting the deployment: +- [Support post](https://aws.amazon.com/premiumsupport/knowledge-center/route53-resolve-with-inbound-endpoint/) +- [Workshop](https://catalog.workshops.aws/networking/en-US/intermediate/3-hybrid-dns/10-hybrid-dns-overview) +- [Reference architecture](https://d1.awsstatic.com/architecture-diagrams/ArchitectureDiagrams/hybrid-dns_route53-resolver-endpoint-ra.pdf) + + +![](img/architecture_frontend_vpc.drawio.png#zoom#shadow) - Third party libraries: data.all static sites libraries are stored on AWS CodeArtifact which ensures third party libraries availability, encryption using AWS KMS and @@ -140,25 +159,37 @@ image, and does not rely on Dockerhub. Docker images are built with AWS CodePipeline and stored on Amazon ECR which ensures image availability, and vulnerabilities scanning. - -![](img/architecture_frontend_vpc.drawio.png#zoom#shadow) - - ## Backend Components ![Screenshot](img/architecture_backend.drawio.png#zoom#shadow) -### VPC +### Backend VPC +#### Created by data.all +If we do not provide a VPC ID for the different infrastructure accounts in the deployment configuration (aka cdk.json), data.all creates its own VPC in the account where it is set up, with usual configuration. -All compute is hosted in the **private subnets**, and communicates with AWS Services through a **NAT Gateway**. +All backend compute is hosted in the **private subnets**, and communicates with AWS Services through a **NAT Gateway**. All data.all Lambda functions and ECS tasks are running inside this VPC and in private subnets. +![Screenshot](img/architecture_vpc.drawio.png#zoom#shadow) +#### Created outside of data.all +There are 2 scenarios where we might want to provide our own VPCs: +1) Organization guidelines. In your organization there are certain policies and mechanisms to create VPCs. +2) Frontend needs to be hosted in data.all VPC facing architecture + +When providing the VPC, your VPC should resemble the image above. + +1. Make sure that it is deployed in at least 2 Availability Zones (AZ) +2. Make sure that it has at least 1 public subnet. Data.all needs to download packages, hence needs public access. +3. Make sure that the private subnets route to a NAT Gateway +4. Make sure that the VPC created does not have an S3 VPC endpoint + +Here is a screenshot of the creation of the VPC: +![Screenshot](img/vpc_setup.png#zoom#shadow) -![Screenshot](img/architecture_vpc.drawio.png#zoom#shadow) ### Backend AWS API Gateway data.all backend main entry point is an AWS API Gateway that exposes a diff --git a/pages/deploy/deploy_aws.md b/pages/deploy/deploy_aws.md index 5ee98b39e..c4c665249 100644 --- a/pages/deploy/deploy_aws.md +++ b/pages/deploy/deploy_aws.md @@ -111,7 +111,10 @@ of our repository. Open it, you should be seen something like: "prod_sizing": "boolean_SET_INFRA_SIZING_TO_PROD_VALUES_IF_TRUE|DEFAULT=true", "enable_cw_rum": "boolean_SET_CLOUDWATCH_RUM_APP_MONITOR|DEFAULT=false", "enable_cw_canaries": "boolean_SET_CLOUDWATCH_CANARIES_FOR_FRONTEND_TESTING|DEFAULT=false", + "enable_quicksight_monitoring": "boolean_ENABLE_CONNECTION_QUICKSIGHT_RDS|DEFAULT=false", "shared_dashboards_sessions": "string_TYPE_SESSION_SHARED_DASHBOARDS|(reader, anonymous) DEFAULT=anonymous", + "enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false", + "enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false" "enable_opensearch_serverless": "boolean_USE_OPENSEARCH_SERVERLESS|DEFAULT=false" } ] @@ -123,30 +126,33 @@ have listed and defined all the parameters of the cdk.json file. If you still ha and find 2 examples of cdk.json files. -| **General Parameters** | **Optional/Required** | **Definition** | -|----------------------------------------|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| tooling_vpc_id | Optional | The VPC ID for the tooling account. If not provided, **a new VPC** will be created. | -| tooling_region | Optional | The AWS region for the tooling account where the AWS CodePipeline pipeline will be created. (default: eu-west-1) | -| git_branch | Optional | The git branch name can be leveraged to deploy multiple AWS CodePipeline pipelines to the same tooling account. (default: main) | -| git_release | Optional | If set to **true**, CI/CD pipeline RELEASE stage is enabled. This stage releases a version out of the current branch. (default: false) | -| quality_gate | Optional | If set to **true**, CI/CD pipeline quality gate stage is enabled. (default: true) | -| resource_prefix | Optional | The prefix used for AWS created resources. It must be in lower case without any special character. (default: dataall) | -| **Deployment environments Parameters** | **Optional/Required** | **Definition** | -| ---------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| envname | REQUIRED | The name of the deployment environment (e.g dev, qa, prod,...). It must be in lower case without any special character. | -| account | REQUIRED | The AWS deployment account (deployment account N) | -| region | REQUIRED | The AWS deployment region | -| with_approval | Optional | If set to **true** an additional step on AWS CodePipeline to require user approval before proceeding with the deployment. (default: false) | -| vpc_id | Optional | The VPC ID for the deployment account. If not provided, **a new VPC** will be created. | -| vpc_endpoints_sg | Optional | The VPC endpoints security groups to be use by AWS services to connect to VPC endpoints. If not assigned, NAT outbound rule is used. | -| internet_facing | Optional | If set to **true** CloudFront is used for hosting data.all UI and Docs and APIs are public. If false, ECS is used to host static sites and APIs are private. (default: true) | -| custom_domain | Optional* | Custom domain configuration: hosted_zone_name, hosted_zone_id, and certificate_arn. If internet_facing parameter is **false** then custom_domain is REQUIRED for ECS ALB integration with ACM and HTTPS. It is optional when internet_facing is true. | -| ip_ranges | Optional | Used only when internet_facing parameter is **false** to allow API Gateway resource policy to allow these IP ranges in addition to the VPC's CIDR block. | -| apig_vpce | Optional | Used only when internet_facing parameter is **false**. If provided, it will be used for API Gateway otherwise a new VPCE will be created. | -| prod_sizing | Optional | If set to **true**, infrastructure sizing is adapted to prod environments. Check additional resources section for more details. (default: true) | -| enable_cw_rum | Optional | If set to **true** CloudWatch RUM monitor is created to monitor the user interface (default: false) | -| enable_cw_canaries | Optional | If set to **true**, CloudWatch Synthetics Canaries are created to monitor the GUI workflow of principle features (default: false) | -| shared_dashboard_sessions | Optional | Either 'anonymous' or 'reader'. It indicates the type of Quicksight session used for Shared Dashboards (default: 'anonymous') | +| **General Parameters** | **Optional/Required** | **Definition** | +|-----------------------------------------------|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| tooling_vpc_id | Optional | The VPC ID for the tooling account. If not provided, **a new VPC** will be created. | +| tooling_region | Optional | The AWS region for the tooling account where the AWS CodePipeline pipeline will be created. (default: eu-west-1) | +| git_branch | Optional | The git branch name can be leveraged to deploy multiple AWS CodePipeline pipelines to the same tooling account. (default: main) | +| git_release | Optional | If set to **true**, CI/CD pipeline RELEASE stage is enabled. This stage releases a version out of the current branch. (default: false) | +| quality_gate | Optional | If set to **true**, CI/CD pipeline quality gate stage is enabled. (default: true) | +| resource_prefix | Optional | The prefix used for AWS created resources. It must be in lower case without any special character. (default: dataall) | +| **Deployment environments Parameters** | **Optional/Required** | **Definition** | +| ---------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| envname | REQUIRED | The name of the deployment environment (e.g dev, qa, prod,...). It must be in lower case without any special character. | +| account | REQUIRED | The AWS deployment account (deployment account N) | +| region | REQUIRED | The AWS deployment region | +| with_approval | Optional | If set to **true** an additional step on AWS CodePipeline to require user approval before proceeding with the deployment. (default: false) | +| vpc_id | Optional | The VPC ID for the deployment account. If not provided, **a new VPC** will be created. | +| vpc_endpoints_sg | Optional | The VPC endpoints security groups to be use by AWS services to connect to VPC endpoints. If not assigned, NAT outbound rule is used. | +| internet_facing | Optional | If set to **true** CloudFront is used for hosting data.all UI and Docs and APIs are public. If false, ECS is used to host static sites and APIs are private. (default: true) | +| custom_domain | Optional* | Custom domain configuration: hosted_zone_name, hosted_zone_id, and certificate_arn. If internet_facing parameter is **false** then custom_domain is REQUIRED for ECS ALB integration with ACM and HTTPS. It is optional when internet_facing is true. | +| ip_ranges | Optional | Used only when internet_facing parameter is **false** to allow API Gateway resource policy to allow these IP ranges in addition to the VPC's CIDR block. | +| apig_vpce | Optional | Used only when internet_facing parameter is **false**. If provided, it will be used for API Gateway otherwise a new VPCE will be created. | +| prod_sizing | Optional | If set to **true**, infrastructure sizing is adapted to prod environments. Check additional resources section for more details. (default: true) | +| enable_cw_rum | Optional | If set to **true** CloudWatch RUM monitor is created to monitor the user interface (default: false) | +| enable_cw_canaries | Optional | If set to **true**, CloudWatch Synthetics Canaries are created to monitor the GUI workflow of principle features (default: false) | +| enable_quicksight_monitoring | Optional | If set to **true**, RDS security groups and VPC NACL rules are modified to allow connection of the RDS metadata database with Quicksight in the infrastructure account (default: false) | +| shared_dashboard_sessions | Optional | Either 'anonymous' or 'reader'. It indicates the type of Quicksight session used for Shared Dashboards (default: 'anonymous') | +| enable_pivot_role_auto_create | Optional | If set to **true**, data.all creates the pivot IAM role as part of the environment stack. If false, a CloudFormation template is provided in the UI and AWS account admins need to deploy this stack as pre-requisite to link a data.all environment (default: false) | +| enable_update_dataall_stacks_in_cicd_pipeline | Optional | If set to **true**, CI/CD pipeline update stacks stage is enabled for the deployment environment. This stage triggers the update of all environment and dataset stacks (default: false) | | | enable_opensearch_serverless | Optional | If set to **true** Amazon OpenSearch Serverless collection is created and used instead of Amazon OpenSearch Service domain (default: false) | **Example 1**: Basic deployment: this is an example of a minimum configured cdk.json file. @@ -199,6 +205,7 @@ deploy to 2 deployments accounts. "prod_sizing": false, "enable_cw_rum": true, "enable_cw_canaries": true + }, { "envname": "prod", @@ -214,7 +221,9 @@ deploy to 2 deployments accounts. "certificate_arn":"arn:aws:acm:AWS_REGION:AWS_ACCOUNT_ID:certificate/CERTIFICATE_ID" }, "ip_ranges": ["IP_RANGE1", "IP_RANGE2"], - "apig_vpce": "vpc-xxxxxxxxxxxxxx" + "apig_vpce": "vpc-xxxxxxxxxxxxxx", + "enable_pivot_role_auto_create": true, + "enable_update_dataall_stacks_in_cicd_pipeline": true } ] } diff --git a/pages/img/vpc_setup.png b/pages/img/vpc_setup.png new file mode 100644 index 000000000..c015d98ee Binary files /dev/null and b/pages/img/vpc_setup.png differ