Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limit dataset role permissions #497

Merged
merged 13 commits into from
Jun 9, 2023

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented Jun 6, 2023

Feature or Bugfix

  • Refactoring

Detail

The resulting IAM policy can:

  • list all buckets
  • read and write objects to the dataset Bucket which is encrypted
  • read S3 access points in the dataset Bucket
  • putLogs in dataset Glue crawler log group
  • read dataset Glue database, read and write tables in the dataset Glue database. This is not strictly necessary as in data.all permission to data is handled using Lake Formation. But restricting the IAM-based data permissions we ensure that any Glue resource that is not protected using Lake Formation is not accessible by this role
  • WIP - read objects to the /profiling/code prefix in the environment bucket
  • WIP - read and write objects to the /profiling/code/results/datasetUri/ prefix in the environment bucket

IMPORTANT: I found a bug related to profiling jobs that prevented me to test the profiling jobs. A separate issue has been opened for it. For this reason the profiling permissions are a work in progress and might require changes. e.g. additional KMS permissions.

It cannot:

  • read or write to any other S3 Bucket
  • use any KMS key different from the dataset KMS key
  • read or write any other Glue database/tables

In addition, the Glue crawler and the profiling Job of the dataset have been modified to always use the dataset role and not the PivotRole to break down the "super permissions" of the pivot role and distribute responsibilities. As a result, the dataset role can be assumed:

  • by the pivotRole -> used whenever users are assuming the role from data.all UI
  • by Glue -> to run Glue profiling jobs and Glue crawler

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx and others added 12 commits May 26, 2023 11:37
### Feature or Bugfix
- Bugfix

### Detail
The constant to define the dataallPivotRole missed an "a" and as a
consequence the storage location for the Dataset was not registered

### Relates

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
### Feature or Bugfix
- Bugfix
- Refactoring

### Detail
- The AWS Cloud Development Kit (CDK) Team recently identified an issue
with the CDK Pipelines construct library that may result in unintended
permissions being granted to authenticated users within your account. As
of April 4, 2023, we have fixed the issue in version 1.200.0 [1] for CDK
v1, and version 2.77.0 [2] for CDK v2. We strongly recommend you upgrade
to one of these versions as soon as possible. Please refer to the
Managing Dependencies documentation [3] in the CDK Developer Guide for
instructions on how to perform the upgrade.
Starting with versions 1.158.0 and 2.26.0, released May 30, 2022, the
library creates a role that allows every identity in the same account
with sts:AssumeRole permissions on Resource: * to assume it. This may
result in granting privileges to authenticated users in your account
allowing them to take pipeline actions beyond what was intended.

### Relates
- N.A

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
### Feature or Bugfix
- Bugfix

### Detail
- Added check and exception if there are open share requests on a
consumption role or on a group that we are removing from an environment

### Relates
- #450 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
…ssions' into feature/limit-dataset-role-permissions

# Conflicts:
#	backend/dataall/cdkproxy/stacks/dataset.py
…anges in read environment bucket - missing:errors on profiling jobs and KMS key of environment bucket
@dlpzx dlpzx marked this pull request as ready for review June 7, 2023 11:18
@dlpzx dlpzx requested a review from noah-paige June 7, 2023 11:18
Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me - just the 1 question on logs permissions

@dlpzx
Copy link
Contributor Author

dlpzx commented Jun 9, 2023

@noah-paige one last comment. I added a clarification on the Glue permissions on the PR description.

read dataset Glue database, read and write tables in the dataset Glue database. This is not strictly necessary as in data.all permission to data is handled using Lake Formation. But restricting the IAM-based data permissions we ensure that any Glue resource that is not protected using Lake Formation is not accessible by this role

Do you agree on this? or am I going to far?

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@dlpzx dlpzx merged commit ddc4360 into v1m6m0 Jun 9, 2023
noah-paige

This comment was marked as off-topic.

@dlpzx dlpzx mentioned this pull request Jul 11, 2023
@dlpzx dlpzx deleted the feature/limit-dataset-role-permissions branch July 14, 2023 10:19
@dlpzx dlpzx mentioned this pull request Jul 18, 2023
7 tasks
dlpzx added a commit that referenced this pull request Jul 19, 2023
### Feature or Bugfix
Release PR with the following list of features. Refer to each PR for the
details

### Detail
- #498 
- #482 
- #543
- #524 (which also solves #531)
- #532 
- #535 
- #497 
- #515
- #529 
- #562 
- #455 
- #572 
- #567 
- #573 
- #579 
- #578 
- #582 

### Breaking changes - release notes
- ⚠️ IMPORTANT: upgrade to a version >V1.5.0 before upgrading to V1.6 to
avoid deletion of resources in custom resource deletion
- ⚠️ IMPORTANT: requires an update of environments and then datasets
after upgrading. Either using cdk.json parameter
`enable_update_dataall_stacks_in_cicd_pipeline`, waiting for overnight
update stack task, or manually updating first environments and then
datasets
- CloudFront distribution replace for #529 
- Additional EC2 permissions in CDK Synth CodeBuild stage for #543 -->
this can be avoided by upgrading to v1.5.6 before upgrading to v1.6.0
- local development affected by more restrictive pivotRole trust policy


### Relates
V1.6.0 release

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Gezim Musliaj <102723839+gmuslia@users.noreply.github.com>
Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com>
Co-authored-by: nikpodsh <124577300+nikpodsh@users.noreply.github.com>
Co-authored-by: chamcca <40579012+chamcca@users.noreply.github.com>
Co-authored-by: Nikita Podshivalov <nikpodsh@amazon.com>
Co-authored-by: dbalintx <132444646+dbalintx@users.noreply.github.com>
Co-authored-by: mourya-33 <134511711+mourya-33@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants