-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1.5.0 Features #409
Merged
V1.5.0 Features #409
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Feature or Bugfix - Feature ### Detail - Add OpenSearch Serverless stack & corresponding feature flag - Update client request signature ("aoss"/"es") and create service SSM parameter - Update domain / collection references - Add ECS task role to OpenSearch Serverless principals - Update CDK library to latest (2.61.1) - Update Pipeline CodeBuild image to `AMAZON_LINUX_2_4` with latest node version to use latest CDK and remove unnecessary packages - Pass VPC endpoints security group to OpenSearch Serverless stack - Update template_cdk.json - Add monitoring By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Feature ### Detail - New cdk.json configuration parameter `tooling_vpc_restricted_nacl` - If set to `true` we create a custom NACL for the tooling data.all created VPC with the following inbound rules. Outbound allows all traffic. ![image](https://user-images.githubusercontent.com/71252798/222077893-69329834-e8a0-4b6e-97e5-2ea2c7833d72.png) - We have enabled DNS private names for CodeArtifact endpoints, to correctly resolve them inside the VPC and keep traffic within the VPC. - Modified some CodeBuild steps to install pip and npm packages from CodeArtifact instead of from the internet. - In #323, CodeBuild Linux images were updated, eliminating the need to install yum packages and node. ### Relates - #307 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Feature ### Detail ####⚠️ ⚠️ ⚠️ IMPORTANT⚠️ ⚠️ ⚠️ IF YOU ARE MIGRATING FROM MANUAL TO CDK-CREATED ROLE, WHILE THE PIPELINE IS UPDATING THE BACKEND CODE, UPDATING ENVIRONMENT/DATASET STACKS AND UPDATING THE FRONTEND CODE YOU WILL EXPERIENCE DOWN TIMES. PLAN THE MIGRATION (AROUND 1-2H) IN A TIME WINDOW THAT DOES NOT IMPACT YOUR USERS. #### Configuration of manual or cdk-created pivotRole With this PR customers will be able to use data.all and link environments using a manually created pivotRole or a cdk-pivotRole deployed as part of the environment stack. What to use is configurable in the `cdk.json` for each infra account. For example, customers can have manually created pivotRoles in "dev" and cdk-created ones in "prod". The configuration affects all environments of the deployment. In this example, environments linked to "prod" will always deploy a cdk-PivotRole, it is not possible to have a mix of manual and cdk roles in the same infra deployment. - Added configuration parameter in `cdk.json` to enable or disable pivotRole creation as part of environment in each development environment: `"enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",` - Creation of SSM Parameter that stores this flag in each of the development environments. (Screenshot outdated, not the parameter is called `enablePivotRoleAutoCreate` to match the cdk.json parameter name ![image](https://user-images.githubusercontent.com/71252798/225842050-36c24de8-4a13-4378-91f8-4a7eb10a4281.png) - Modification of the "dataall-pivotrole-name" Secret in central account. For pivotRole created manually it stays in "dataallPivotRole" for cdk-created ones "dataallPivotRole-cdk" to avoid issues with conflicting IAM roles. - Added CDK nested stack for PivotRole IAM role and add it to the Environment stack. It is deployed depending on the value of the created SSM parameter. Permissions that reference the pivotRole are adapted to use the pivotRole-cdk when it is enabled. - Conditional pre-requisites box in CreateEnvironment UI view depending on the value of the SSM Parameter. The value of the SSM parameter is written to the `.env` file of the React application and is used in the `frontend/src/views/Environments/EnvironmentCreateForm.js`. (With `enable_pivot_role_auto_create: false`) <img width="1407" alt="image" src="https://user-images.githubusercontent.com/71252798/225843691-1fa8a99f-1936-4061-bac9-80030e6f4232.png"> (With `enable_pivot_role_auto_create: true`) ![image](https://user-images.githubusercontent.com/71252798/226277248-97c8550e-053d-4d1d-8313-bfd7650a71f5.png) #### Refactor AWS account checks In the resolvers and in the environment stack deployment originally we performed some checks in the account with boto3 calls assuming that the pivotRole was created. In this PR some of this checks are now effectuated with the `cdk look up role` and other tests have been moved to other places in the code. In addition, more tests are added to consider all scenarios (e.g. check that pivotRole exists in manual-pivot role deployments). - Modified check of CDKToolkit CFN stack to avoid using pivotRole <img width="335" alt="image" src="https://user-images.githubusercontent.com/71252798/225841263-5991165d-d1ed-4aeb-9ec8-017357485434.png"> - Added check of pivotRole creation (for manual created pivotRole) with CDK look-up role. <img width="1260" alt="image" src="https://user-images.githubusercontent.com/71252798/223967385-450d9018-43e5-45b6-9736-225c627af989.png"> - Added check_environment methods to the update_environment API call. - Previous check of existing SageMaker Studio domain when MLStudio is enabled is now executed with cdk look up role - Previous check of Quicksight subscription when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started. - Previous creation of data.all Quicksight default group when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started. #### Migrating from manual to cdk-created pivotRole This PR takes into consideration the scenario in which customers already have a deployment of data.all with environments linked using a manually created pivotRole and want to leverage cdk-created pivotRoles. To do that, customers just need to add the `enable_pivot_role_auto_create` parameter in their `cdk.json` configuration and set it to `true`. Once the CICD pipeline has completed: new linked environments will contain the nested cdk-pivotRole stack (no actions needed) and existing environments can be updated by: a) manually, by clicking on "update stack" in the environment>stack tab b) automatically, wait for the `stack-updater` ECS task that runs daily overnight c) automatically, set the added `update_dataall_stacks_in_cicd_pipeline` parameter to `true` in the `cdk.json` config file. The `stack-updater` ECS task will be triggered from the CICD pipeline - Added configuration parameter in `cdk.json` to add a CodeBuild stage in CICD pipeline to trigger update of environments and datasets stacks: `"enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false"` This parameter can be set back to `false` whenever customers do not foresee many changes in the dataset and environment stacks and back to true when we want to force the update of stacks. This can be useful to ensure changes in the stacks are applied immediately instead of waiting for the stack-updater to run overnight. <img width="1328" alt="image" src="https://user-images.githubusercontent.com/71252798/226288484-3109a685-4bda-41f8-ab49-84aaa69796bf.png"> - Modified `stacks-updater` ECS task to run environment updates first, wait until completion and then update dataset stacks. The reason of keeping this order is that the datasets of an environment use Lambda functions created in the environment stack as Custom resources (glue-db-handler). The original Lambda function needs to be updated to use the new pivotRole before being used in the Dataset stack. Plus, Quicksight group creation which happens in the dataset stack uses the pivotRole because the cdk-look-up role has no Quicksight permissions. - Replaced the CFN `AWS::LakeFormation::Resource` by a Custom Resource and added the necessary 'lakeformation:De/registerResource` permissions to Pivot Role. Using the Cfn resource directly caused internal Failure errors. To have more control over the update we now execute it as a custom resource (in the same way as we do with the glue-db-handler). ![image](https://user-images.githubusercontent.com/71252798/226554050-e9d2faa0-96fc-4677-946d-47c6d4c04005.png) ####⚠️ ⚠️ ⚠️ Migration risks⚠️ ⚠️ ⚠️ Removing the AWS::LakeFormation::Resource resource and replacing it with a Custom Resource within the same/single update of the dataset stack causes a race condition since: 1) deleting AWS::LakeFormation::Resource triggers deletion of the data location resource from Lake Formation. 2) adding a custom resource or actually replacing AWS::LakeFormation::Resource with a custom resource creates or updates the data location resource. Then the output of this update of the dataset stack depends on the order in which 1 & 2 occurs, in the worst case 2 happens before 1, which results in deletion of the Lake Formation data location resource. This is explained because both AWS::LakeFormation::Resource and Custom Resource point to the same DataLakeLocation which can be registered just once. This error will be solved in a second update of stacks, in which the 'on_update' event of the datalakelocation handler checks if the location was registered in Lake Formation and creates the corresponding data location resource if it is missing. A second update can be triggered manually, by going to the UI, selecting the dataset and clicking on update stack in the stack tab. If it is not triggered manually, an scheduled task runs this update daily. #### Documentation - Userguide changes in environment section - GitHub pages changes (in a different PR) #### Additional enhancements - For the dataset custom resources, now the provider creation has been moved to the environment stack avoiding [Lambda policy size limitations](https://repost.aws/knowledge-center/lambda-resource-based-policy-size-error), the SSM parameters for the Lambda arn and name are left for reference if needed. - Modified Default Lake Formation settings custom resource in environment stack: added validation of IAM roles when adding LakeFormation data lake admins, if the IAM roles do not exist, data.all removes them from data lake admins to avoid errors. It now removes pivotRole (manual or cdk) from LakeFormation data lake admins on environment stack deletion. - Make team IAM roles policies unique by adding environmentUri to their policy names - bugfix: [Fix local imports of shareItemSM for Docker and quicksight properties](527f561) - Isolated SageMaker Studio domain stack in a different nested stack in the Environment stack for readability ### Relates - #251 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Dariusz Osiennik <osiend@amazon.com> Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com> Co-authored-by: Dennis Goldner <107395339+degoldner@users.noreply.github.com>
### Feature or Bugfix - Feature ### Detail - Add OpenSearch Serverless stack & corresponding feature flag - Update client request signature ("aoss"/"es") and create service SSM parameter - Update domain / collection references - Add ECS task role to OpenSearch Serverless principals - Update CDK library to latest (2.61.1) - Update Pipeline CodeBuild image to `AMAZON_LINUX_2_4` with latest node version to use latest CDK and remove unnecessary packages - Pass VPC endpoints security group to OpenSearch Serverless stack - Update template_cdk.json - Add monitoring By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Feature ### Detail - New cdk.json configuration parameter `tooling_vpc_restricted_nacl` - If set to `true` we create a custom NACL for the tooling data.all created VPC with the following inbound rules. Outbound allows all traffic. ![image](https://user-images.githubusercontent.com/71252798/222077893-69329834-e8a0-4b6e-97e5-2ea2c7833d72.png) - We have enabled DNS private names for CodeArtifact endpoints, to correctly resolve them inside the VPC and keep traffic within the VPC. - Modified some CodeBuild steps to install pip and npm packages from CodeArtifact instead of from the internet. - In #323, CodeBuild Linux images were updated, eliminating the need to install yum packages and node. ### Relates - #307 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Feature ### Detail ####⚠️ ⚠️ ⚠️ IMPORTANT⚠️ ⚠️ ⚠️ IF YOU ARE MIGRATING FROM MANUAL TO CDK-CREATED ROLE, WHILE THE PIPELINE IS UPDATING THE BACKEND CODE, UPDATING ENVIRONMENT/DATASET STACKS AND UPDATING THE FRONTEND CODE YOU WILL EXPERIENCE DOWN TIMES. PLAN THE MIGRATION (AROUND 1-2H) IN A TIME WINDOW THAT DOES NOT IMPACT YOUR USERS. #### Configuration of manual or cdk-created pivotRole With this PR customers will be able to use data.all and link environments using a manually created pivotRole or a cdk-pivotRole deployed as part of the environment stack. What to use is configurable in the `cdk.json` for each infra account. For example, customers can have manually created pivotRoles in "dev" and cdk-created ones in "prod". The configuration affects all environments of the deployment. In this example, environmentslinked to "prod" will always deploy a cdk-PivotRole, it is not possible to have a mix of manual and cdk roles in the same infra deployment. - Added configuration parameter in `cdk.json` to enable or disable pivotRole creation as part of environment in each development environment: `"enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",` - Creation of SSM Parameter that stores this flag in each of the development environments. (Screenshot outdated, not the parameter is called `enablePivotRoleAutoCreate` to match the cdk.json parameter name ![image](https://user-images.githubusercontent.com/71252798/225842050-36c24de8-4a13-4378-91f8-4a7eb10a4281.png) - Modification of the "dataall-pivotrole-name" Secret in central account. For pivotRole created manually it stays in "dataallPivotRole" for cdk-created ones "dataallPivotRole-cdk" to avoid issues with conflicting IAM roles. - Added CDK nested stack for PivotRole IAM role and add it to the Environment stack. It is deployed depending on the value of the created SSM parameter. Permissions that reference the pivotRole are adapted to use the pivotRole-cdk when it is enabled. - Conditional pre-requisites box in CreateEnvironment UI view depending on the value of the SSM Parameter. The value of the SSM parameter is written to the `.env` file of the React application and is used in the `frontend/src/views/Environments/EnvironmentCreateForm.js`. (With `enable_pivot_role_auto_create: false`) <img width="1407" alt="image" src="https://user-images.githubusercontent.com/71252798/225843691-1fa8a99f-1936-4061-bac9-80030e6f4232.png"> (With `enable_pivot_role_auto_create: true`) ![image](https://user-images.githubusercontent.com/71252798/226277248-97c8550e-053d-4d1d-8313-bfd7650a71f5.png) #### Refactor AWS account checks In the resolvers and in the environment stack deployment originally we performed some checks in the account with boto3 calls assuming that the pivotRole was created. In this PR some of this checks are now effectuated with the `cdk look up role` and other tests have been moved to other places in the code. In addition, more tests are added to consider all scenarios (e.g. check that pivotRole exists in manual-pivot role deployments). - Modified check of CDKToolkit CFN stack to avoid using pivotRole <img width="335" alt="image" src="https://user-images.githubusercontent.com/71252798/225841263-5991165d-d1ed-4aeb-9ec8-017357485434.png"> - Added check of pivotRole creation (for manual created pivotRole) with CDK look-up role. <img width="1260" alt="image" src="https://user-images.githubusercontent.com/71252798/223967385-450d9018-43e5-45b6-9736-225c627af989.png"> - Added check_environment methods to the update_environment API call. - Previous check of existing SageMaker Studio domain when MLStudio is enabled is now executed with cdk look up role - Previous check of Quicksight subscription when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started. - Previous creation of data.all Quicksight default group when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started. #### Migrating from manual to cdk-created pivotRole This PR takes into consideration the scenario in which customers already have a deployment of data.all with environments linked using a manually created pivotRole and want to leverage cdk-created pivotRoles. To do that, customers just need to add the `enable_pivot_role_auto_create` parameter in their `cdk.json` configuration and set it to `true`. Once the CICD pipeline has completed: new linked environments will contain the nested cdk-pivotRole stack (no actions needed) and existing environments can be updated by: a) manually, by clicking on "update stack" in the environment>stack tab b) automatically, wait for the `stack-updater` ECS task that runs daily overnight c) automatically, set the added `update_dataall_stacks_in_cicd_pipeline` parameter to `true` in the `cdk.json` config file. The `stack-updater` ECS task will be triggered from the CICD pipeline - Added configuration parameter in `cdk.json` to add a CodeBuild stage in CICD pipeline to trigger update of environments and datasets stacks: `"enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false"` This parameter can be set back to `false` whenever customers do not foresee many changes in the dataset and environment stacks and back to true when we want to force the update of stacks. This can be useful to ensure changes in the stacks are applied immediately instead of waiting for the stack-updater to run overnight. <img width="1328" alt="image" src="https://user-images.githubusercontent.com/71252798/226288484-3109a685-4bda-41f8-ab49-84aaa69796bf.png"> - Modified `stacks-updater` ECS task to run environment updates first, wait until completion and then update dataset stacks. The reason of keeping this order is that the datasets of an environment use Lambda functions created in the environment stack as Custom resources (glue-db-handler). The original Lambda function needs to be updated to use the new pivotRole before being used in the Dataset stack. Plus, Quicksight group creation which happens in the dataset stack uses the pivotRole because the cdk-look-up role has no Quicksight permissions. - Replaced the CFN `AWS::LakeFormation::Resource` by a Custom Resource and added the necessary 'lakeformation:De/registerResource` permissions to Pivot Role. Using the Cfn resource directly caused internal Failure errors. To have more control over the update we now execute it as a custom resource (in the same way as we do with the glue-db-handler). ![image](https://user-images.githubusercontent.com/71252798/226554050-e9d2faa0-96fc-4677-946d-47c6d4c04005.png) ####⚠️ ⚠️ ⚠️ Migration risks⚠️ ⚠️ ⚠️ Removing the AWS::LakeFormation::Resource resource and replacing it with a Custom Resource within the same/single update of the dataset stack causes a race condition since: 1) deleting AWS::LakeFormation::Resource triggers deletion of the data location resource from Lake Formation. 2) adding a custom resource or actually replacing AWS::LakeFormation::Resource with a custom resource creates or updates the data location resource. Then the output of this update of the dataset stack depends on the order in which 1 & 2 occurs, in the worst case 2 happens before 1, which results in deletion of the Lake Formation data location resource. This is explained because both AWS::LakeFormation::Resource and Custom Resource point to the same DataLakeLocation which can be registered just once. This error will be solved in a second update of stacks, in which the 'on_update' event of the datalakelocation handler checks if the location was registered in Lake Formation and creates the corresponding data location resource if it is missing. A second update can be triggered manually, by going to the UI, selecting the dataset and clicking on update stack in the stack tab. If it is not triggered manually, an scheduled task runs this update daily. #### Documentation - Userguide changes in environment section - GitHub pages changes (in a different PR) #### Additional enhancements - For the dataset custom resources, now the provider creation has been moved to the environment stack avoiding [Lambda policy size limitations](https://repost.aws/knowledge-center/lambda-resource-based-policy-size-error), the SSM parameters for the Lambda arn and name are left for reference if needed. - Modified Default Lake Formation settings custom resource in environment stack: added validation of IAM roles when adding LakeFormation data lake admins, if the IAM roles do not exist, data.all removes them from data lake admins to avoid errors. It now removes pivotRole (manual or cdk) from LakeFormation data lake admins on environment stack deletion. - Make team IAM roles policies unique by adding environmentUri to their policy names - bugfix: [Fix local imports of shareItemSM for Docker and quicksight properties](527f561) - Isolated SageMaker Studio domain stack in a different nested stack in the Environment stack for readability ### Relates - #251 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
# Conflicts: # backend/dataall/cdkproxy/stacks/dataset.py # backend/dataall/db/api/dataset.py # deploy/pivot_role/pivotRole.yaml
This was referenced Apr 18, 2023
…for SageMaker domain (#420) ### Feature or Bugfix - Feature - Bugfix ### Detail - Instead of creating the SageMaker Studio domain as a nested stack we create it as part of the environment stack. To clearly show that the resources created for SageMaker are part of the ML Studio functionality they `check_existing_sagemaker_studio_domain` and `create_sagemaker_domain_resources` are class methods of `SageMakerDomain` placed in `backend/dataall/cdkproxy/stacks/sagemakerstudio.py`. - As reported in #352 data.all uses the default VPC of the account, which does not fill the requirements for SM Studio. This results in long start times. This PR also adds the creation of a dedicated VPC that solves the issue of slow starts. - It is not possible to modify the networking configuration of an existing SageMaker Studio domain. In CloudFormation it deletes and re-creates the domain (replacement= True), and if it has Studio users it results in failure of the CloudFormation stack. For this reason I kept the previous implementation using the default VPC. If a customer opts to use a dedicated networking they need to delete the default VPC. This is an interim solution and we will look for better ways to migrate to a dedicated SM VPC once we get more info on how customers are using data.all ML Studio ### Relates - #409 - #352 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Enhancement ### Detail - Added method to check if default VPC exists instead of relying in CFN stack failing. Since it uses the cdk-look-up role we do not need to add any ec2 permissions on the pivotRole ### Relates - #352 #409 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@dlpzx is there guidance around migrating to OpenSearch serverless? |
…ed deletions (#429) ### Feature or Bugfix - Bugfix ### Detail Giving a more descriptive name to the logical ID of the glue-handler custom resource results in CloudFormation recreating it. That means that CFN: 1) creates new custom resource and tries to create db and grant permissions (db already exists, so it skips) 2) deletes old custom resource and in this action it deleted the db --> we cannot allow this as it is removing the database and tables from Glue which will break our sharing and LF permissions ### Relates - #409 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Bugfix - Refactoring ### Detail - Reverted LakeFormation registering location to a CloudFormation resource instead of using a custom resource. There was a bug on the CloudFormation LakeFormation resource and a storage location could not be updated. The bug has been fixed and we can now update the LakeFormation resources (which was needed to update from PivotRole to PivotRole-cdk) - Readded the original custom resource that creates the dataset Glue database and grants permissions in Lake Formation. We wanted to replace the custom resource service token because it was causing Lambda policies to reach their maximum size; but it is not possible to update the service token of a custom resource, one has to recreate it. The issue is that when we directly replace the custom resource it deletes the database (on_delete) when the CloudFormation stack cleans up the resources. Deleting the Glue database does not delete the S3 data, but it deletes all Lake Formation permissions, which is critical for production use-cases. We cannot introduce this risk, so we have updated the old custom resource Lambda to not delete anything on the on_delete event and we will leave the deprecated custom resource until the next minor release where we will safely remove it from the dataset stack. In V1.6.0 we will ensure to notify that upgrading to V1.5.0 is a requirement before upgrading to V1.6.0. - Added documentation for ML Studio networking requirements and new features - Added UserGuide as pdf for V1.5.0 ### Relates - #409 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
NickCorbett
approved these changes
Apr 25, 2023
This was referenced Apr 25, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature or Bugfix
Detail
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.