Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PivotRole stack as CDK stack #251

Closed
dlpzx opened this issue Dec 12, 2022 · 0 comments · Fixed by #253
Closed

PivotRole stack as CDK stack #251

dlpzx opened this issue Dec 12, 2022 · 0 comments · Fixed by #253
Assignees
Labels
type: enhancement Feature enhacement

Comments

@dlpzx
Copy link
Contributor

dlpzx commented Dec 12, 2022

Is your idea related to a problem? Please describe.
I'm always frustrated when new permissions are needed in the PivotRole and I have to update all my CloudFormation stacks (one in each environment account)

Describe the solution you'd like
I would like to implement some sort of automation outside of data.all to update this IAM role in a more effective way. For that I would like to have the CDK version of the PivotRole available as part of the repository as a reference

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

@dlpzx dlpzx added the type: enhancement Feature enhacement label Dec 12, 2022
@dlpzx dlpzx self-assigned this Dec 12, 2022
@dlpzx dlpzx linked a pull request Dec 13, 2022 that will close this issue
dlpzx added a commit that referenced this issue Dec 16, 2022
### Feature or Bugfix
- Feature

### Detail
- Added the CDK version of the pivot role template. It is a reference
for customers to automate the creation of the pivot role as part of
baseline insfratructure automated deployments in the environment
accounts.

### Relates
#251 
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Apr 5, 2023
### Feature or Bugfix
- Feature

### Detail
#### ⚠️ ⚠️ ⚠️ IMPORTANT ⚠️ ⚠️ ⚠️ 
IF YOU ARE MIGRATING FROM MANUAL TO CDK-CREATED ROLE, WHILE THE PIPELINE
IS UPDATING THE BACKEND CODE, UPDATING ENVIRONMENT/DATASET STACKS AND
UPDATING THE FRONTEND CODE YOU WILL EXPERIENCE DOWN TIMES. PLAN THE
MIGRATION (AROUND 1-2H) IN A TIME WINDOW THAT DOES NOT IMPACT YOUR
USERS.

#### Configuration of manual or cdk-created pivotRole
With this PR customers will be able to use data.all and link
environments using a manually created pivotRole or a cdk-pivotRole
deployed as part of the environment stack. What to use is configurable
in the `cdk.json` for each infra account. For example, customers can
have manually created pivotRoles in "dev" and cdk-created ones in
"prod". The configuration affects all environments of the deployment. In
this example, environments linked to "prod" will always deploy a
cdk-PivotRole, it is not possible to have a mix of manual and cdk roles
in the same infra deployment.


- Added configuration parameter in `cdk.json` to enable or disable
pivotRole creation as part of environment in each development
environment:
`"enable_pivot_role_auto_create":
"boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",`

- Creation of SSM Parameter that stores this flag in each of the
development environments. (Screenshot outdated, not the parameter is
called `enablePivotRoleAutoCreate` to match the cdk.json parameter name

![image](https://user-images.githubusercontent.com/71252798/225842050-36c24de8-4a13-4378-91f8-4a7eb10a4281.png)

- Modification of the "dataall-pivotrole-name" Secret in central
account. For pivotRole created manually it stays in "dataallPivotRole"
for cdk-created ones "dataallPivotRole-cdk" to avoid issues with
conflicting IAM roles.

- Added CDK nested stack for PivotRole IAM role and add it to the
Environment stack. It is deployed depending on the value of the created
SSM parameter. Permissions that reference the pivotRole are adapted to
use the pivotRole-cdk when it is enabled.

- Conditional pre-requisites box in CreateEnvironment UI view depending
on the value of the SSM Parameter. The value of the SSM parameter is
written to the `.env` file of the React application and is used in the
`frontend/src/views/Environments/EnvironmentCreateForm.js`.
(With `enable_pivot_role_auto_create: false`)
<img width="1407" alt="image"
src="https://user-images.githubusercontent.com/71252798/225843691-1fa8a99f-1936-4061-bac9-80030e6f4232.png">

(With `enable_pivot_role_auto_create: true`)

![image](https://user-images.githubusercontent.com/71252798/226277248-97c8550e-053d-4d1d-8313-bfd7650a71f5.png)

#### Refactor AWS account checks
In the resolvers and in the environment stack deployment originally we
performed some checks in the account with boto3 calls assuming that the
pivotRole was created. In this PR some of this checks are now
effectuated with the `cdk look up role` and other tests have been moved
to other places in the code. In addition, more tests are added to
consider all scenarios (e.g. check that pivotRole exists in manual-pivot
role deployments).

- Modified check of CDKToolkit CFN stack to avoid using pivotRole
<img width="335" alt="image"
src="https://user-images.githubusercontent.com/71252798/225841263-5991165d-d1ed-4aeb-9ec8-017357485434.png">

- Added check of pivotRole creation (for manual created pivotRole) with
CDK look-up role.
<img width="1260" alt="image"
src="https://user-images.githubusercontent.com/71252798/223967385-450d9018-43e5-45b6-9736-225c627af989.png">

- Added check_environment methods to the update_environment API call.

- Previous check of existing SageMaker Studio domain when MLStudio is
enabled is now executed with cdk look up role
- Previous check of Quicksight subscription when Dashboards is enabled
now happens after environment creation, when a Dataset is created or
when a Dashboard-Quicksight session is started.
- Previous creation of data.all Quicksight default group when Dashboards
is enabled now happens after environment creation, when a Dataset is
created or when a Dashboard-Quicksight session is started.


#### Migrating from manual to cdk-created pivotRole
This PR takes into consideration the scenario in which customers already
have a deployment of data.all with environments linked using a manually
created pivotRole and want to leverage cdk-created pivotRoles.

To do that, customers just need to add the
`enable_pivot_role_auto_create` parameter in their `cdk.json`
configuration and set it to `true`. Once the CICD pipeline has
completed: new linked environments will contain the nested cdk-pivotRole
stack (no actions needed) and existing environments can be updated by:
a) manually, by clicking on "update stack" in the environment>stack tab
b) automatically, wait for the `stack-updater` ECS task that runs daily
overnight
c) automatically, set the added `update_dataall_stacks_in_cicd_pipeline`
parameter to `true` in the `cdk.json` config file. The `stack-updater`
ECS task will be triggered from the CICD pipeline

- Added configuration parameter in `cdk.json` to add a CodeBuild stage
in CICD pipeline to trigger update of environments and datasets stacks:
`"enable_update_dataall_stacks_in_cicd_pipeline":
"boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false"`
This parameter can be set back to `false` whenever customers do not
foresee many changes in the dataset and environment stacks and back to
true when we want to force the update of stacks. This can be useful to
ensure changes in the stacks are applied immediately instead of waiting
for the stack-updater to run overnight.

<img width="1328" alt="image"
src="https://user-images.githubusercontent.com/71252798/226288484-3109a685-4bda-41f8-ab49-84aaa69796bf.png">

- Modified `stacks-updater` ECS task to run environment updates first,
wait until completion and then update dataset stacks. The reason of
keeping this order is that the datasets of an environment use Lambda
functions created in the environment stack as Custom resources
(glue-db-handler). The original Lambda function needs to be updated to
use the new pivotRole before being used in the Dataset stack. Plus,
Quicksight group creation which happens in the dataset stack uses the
pivotRole because the cdk-look-up role has no Quicksight permissions.

- Replaced the CFN `AWS::LakeFormation::Resource` by a Custom Resource
and added the necessary 'lakeformation:De/registerResource` permissions
to Pivot Role. Using the Cfn resource directly caused internal Failure
errors. To have more control over the update we now execute it as a
custom resource (in the same way as we do with the glue-db-handler).


![image](https://user-images.githubusercontent.com/71252798/226554050-e9d2faa0-96fc-4677-946d-47c6d4c04005.png)

#### ⚠️ ⚠️ ⚠️ Migration risks ⚠️ ⚠️ ⚠️ 

Removing the AWS::LakeFormation::Resource resource and replacing it with
a Custom Resource within the same/single update of the dataset stack
causes a race condition since:
1) deleting AWS::LakeFormation::Resource triggers deletion of the data
location resource from Lake Formation.
2) adding a custom resource or actually replacing
AWS::LakeFormation::Resource with a custom resource creates or updates
the data location resource.

Then the output of this update of the dataset stack depends on the order
in which 1 & 2 occurs, in the worst case 2 happens before 1, which
results in deletion of the Lake Formation data location resource. This
is explained because both AWS::LakeFormation::Resource and Custom
Resource point to the same DataLakeLocation which can be registered just
once.

This error will be solved in a second update of stacks, in which the
'on_update' event of the datalakelocation handler checks if the location
was registered in Lake Formation and creates the corresponding data
location resource if it is missing. A second update can be triggered
manually, by going to the UI, selecting the dataset and clicking on
update stack in the stack tab. If it is not triggered manually, an
scheduled task runs this update daily.

#### Documentation
- Userguide changes in environment section
- GitHub pages changes (in a different PR)

#### Additional enhancements
- For the dataset custom resources, now the provider creation has been
moved to the environment stack avoiding [Lambda policy size
limitations](https://repost.aws/knowledge-center/lambda-resource-based-policy-size-error),
the SSM parameters for the Lambda arn and name are left for reference if
needed.
- Modified Default Lake Formation settings custom resource in
environment stack: added validation of IAM roles when adding
LakeFormation data lake admins, if the IAM roles do not exist, data.all
removes them from data lake admins to avoid errors. It now removes
pivotRole (manual or cdk) from LakeFormation data lake admins on
environment stack deletion.
- Make team IAM roles policies unique by adding environmentUri to their
policy names
- bugfix: [Fix local imports of shareItemSM for Docker and quicksight
properties](527f561)
- Isolated SageMaker Studio domain stack in a different nested stack in
the Environment stack for readability


### Relates
- #251 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Dariusz Osiennik <osiend@amazon.com>
Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com>
Co-authored-by: Dennis Goldner <107395339+degoldner@users.noreply.github.com>
dlpzx added a commit that referenced this issue Apr 11, 2023
### Feature or Bugfix
- Feature

### Detail
#### ⚠️ ⚠️ ⚠️ IMPORTANT ⚠️ ⚠️ ⚠️
IF YOU ARE MIGRATING FROM MANUAL TO CDK-CREATED ROLE, WHILE THE PIPELINE IS UPDATING THE BACKEND CODE, UPDATING ENVIRONMENT/DATASET STACKS AND UPDATING THE FRONTEND CODE YOU WILL EXPERIENCE DOWN TIMES. PLAN THE MIGRATION (AROUND 1-2H) IN A TIME WINDOW THAT DOES NOT IMPACT YOUR USERS.

#### Configuration of manual or cdk-created pivotRole
With this PR customers will be able to use data.all and link environments using a manually created pivotRole or a cdk-pivotRole deployed as part of the environment stack. What to use is configurable in the `cdk.json` for each infra account. For example, customers can have manually created pivotRoles in "dev" and cdk-created ones in "prod". The configuration affects all environments of the deployment. In this example, environmentslinked to "prod" will always deploy a cdk-PivotRole, it is not possible to have a mix of manual and cdk roles in the same infra deployment.

- Added configuration parameter in `cdk.json` to enable or disable pivotRole creation as part of environment in each development environment:
 `"enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",`

- Creation of SSM Parameter that stores this flag in each of the development environments. (Screenshot outdated, not the parameter is called `enablePivotRoleAutoCreate` to match the cdk.json parameter name
![image](https://user-images.githubusercontent.com/71252798/225842050-36c24de8-4a13-4378-91f8-4a7eb10a4281.png)

- Modification of the "dataall-pivotrole-name" Secret in central account. For pivotRole created manually it stays in "dataallPivotRole" for cdk-created ones "dataallPivotRole-cdk" to avoid issues with conflicting IAM roles.

- Added CDK nested stack for PivotRole IAM role and add it to the Environment stack. It is deployed depending on the value of the created SSM parameter. Permissions that reference the pivotRole are adapted to use the pivotRole-cdk when it is enabled.

- Conditional pre-requisites box in CreateEnvironment UI view depending on the value of the SSM Parameter. The value of the SSM parameter is written to the `.env` file of the React application and is used in the `frontend/src/views/Environments/EnvironmentCreateForm.js`.
(With `enable_pivot_role_auto_create: false`)
<img width="1407" alt="image" src="https://user-images.githubusercontent.com/71252798/225843691-1fa8a99f-1936-4061-bac9-80030e6f4232.png">

(With `enable_pivot_role_auto_create: true`)
![image](https://user-images.githubusercontent.com/71252798/226277248-97c8550e-053d-4d1d-8313-bfd7650a71f5.png)

#### Refactor AWS account checks
In the resolvers and in the environment stack deployment originally we performed some checks in the account with boto3 calls assuming that the pivotRole was created. In this PR some of this checks are now effectuated with the `cdk look up role` and other tests have been moved to other places in the code. In addition, more tests are added to consider all scenarios (e.g. check that pivotRole exists in manual-pivot role deployments).

- Modified check of CDKToolkit CFN stack to avoid using pivotRole
<img width="335" alt="image" src="https://user-images.githubusercontent.com/71252798/225841263-5991165d-d1ed-4aeb-9ec8-017357485434.png">

- Added check of pivotRole creation (for manual created pivotRole) with CDK look-up role.
<img width="1260" alt="image" src="https://user-images.githubusercontent.com/71252798/223967385-450d9018-43e5-45b6-9736-225c627af989.png">

- Added check_environment methods to the update_environment API call.

- Previous check of existing SageMaker Studio domain when MLStudio is enabled is now executed with cdk look up role
- Previous check of Quicksight subscription when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started.
- Previous creation of data.all Quicksight default group when Dashboards is enabled now happens after environment creation, when a Dataset is created or when a Dashboard-Quicksight session is started.

#### Migrating from manual to cdk-created pivotRole
This PR takes into consideration the scenario in which customers already have a deployment of data.all with environments linked using a manually created pivotRole and want to leverage cdk-created pivotRoles.

To do that, customers just need to add the `enable_pivot_role_auto_create` parameter in their `cdk.json` configuration and set it to `true`. Once the CICD pipeline has completed: new linked environments will contain the nested cdk-pivotRole stack (no actions needed) and existing environments can be updated by:
a) manually, by clicking on "update stack" in the environment>stack tab
b) automatically, wait for the `stack-updater` ECS task that runs daily overnight
c) automatically, set the added `update_dataall_stacks_in_cicd_pipeline` parameter to `true` in the `cdk.json` config file. The `stack-updater` ECS task will be triggered from the CICD pipeline

- Added configuration parameter in `cdk.json` to add a CodeBuild stage in CICD pipeline to trigger update of environments and datasets stacks:
`"enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false"`
This parameter can be set back to `false` whenever customers do not foresee many changes in the dataset and environment stacks and back to true when we want to force the update of stacks. This can be useful to ensure changes in the stacks are applied immediately instead of waiting for the stack-updater to run overnight.

<img width="1328" alt="image" src="https://user-images.githubusercontent.com/71252798/226288484-3109a685-4bda-41f8-ab49-84aaa69796bf.png">

- Modified `stacks-updater` ECS task to run environment updates first, wait until completion and then update dataset stacks. The reason of keeping this order is that the datasets of an environment use Lambda functions created in the environment stack as Custom resources (glue-db-handler). The original Lambda function needs to be updated to use the new pivotRole before being used in the Dataset stack. Plus, Quicksight group creation which happens in the dataset stack uses the pivotRole because the cdk-look-up role has no Quicksight permissions.

- Replaced the CFN `AWS::LakeFormation::Resource` by a Custom Resource and added the necessary 'lakeformation:De/registerResource` permissions to Pivot Role. Using the Cfn resource directly caused internal Failure errors. To have more control over the update we now execute it as a custom resource (in the same way as we do with the glue-db-handler).

![image](https://user-images.githubusercontent.com/71252798/226554050-e9d2faa0-96fc-4677-946d-47c6d4c04005.png)

#### ⚠️ ⚠️ ⚠️ Migration risks ⚠️ ⚠️ ⚠️

Removing the AWS::LakeFormation::Resource resource and replacing it with a Custom Resource within the same/single update of the dataset stack causes a race condition since:
1) deleting AWS::LakeFormation::Resource triggers deletion of the data location resource from Lake Formation.
2) adding a custom resource or actually replacing AWS::LakeFormation::Resource with a custom resource creates or updates the data location resource.

Then the output of this update of the dataset stack depends on the order in which 1 & 2 occurs, in the worst case 2 happens before 1, which results in deletion of the Lake Formation data location resource. This is explained because both AWS::LakeFormation::Resource and Custom Resource point to the same DataLakeLocation which can be registered just once.

This error will be solved in a second update of stacks, in which the 'on_update' event of the datalakelocation handler checks if the location was registered in Lake Formation and creates the corresponding data location resource if it is missing. A second update can be triggered manually, by going to the UI, selecting the dataset and clicking on update stack in the stack tab. If it is not triggered manually, an scheduled task runs this update daily.

#### Documentation
- Userguide changes in environment section
- GitHub pages changes (in a different PR)

#### Additional enhancements
- For the dataset custom resources, now the provider creation has been moved to the environment stack avoiding [Lambda policy size limitations](https://repost.aws/knowledge-center/lambda-resource-based-policy-size-error), the SSM parameters for the Lambda arn and name are left for reference if needed.
- Modified Default Lake Formation settings custom resource in environment stack: added validation of IAM roles when adding LakeFormation data lake admins, if the IAM roles do not exist, data.all removes them from data lake admins to avoid errors. It now removes pivotRole (manual or cdk) from LakeFormation data lake admins on environment stack deletion.
- Make team IAM roles policies unique by adding environmentUri to their policy names
- bugfix: [Fix local imports of shareItemSM for Docker and quicksight properties](527f561)
- Isolated SageMaker Studio domain stack in a different nested stack in the Environment stack for readability

### Relates
- #251

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Feature enhacement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant