-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit Pivot Role S3 permissions #580
Comments
@dlpzx my concerns:
Can you document as well:
|
Hi @zsaltys I added the missing info in the issue description except for the last point that I need some time. I suspect that you also want to remove the pivot role form the trust policy of the dataset role right? |
@dlpzx one thing that needs to be addressed is that pivot role should not have putbucketpolicy permission as it can potentially lead to other permissions. One way to address this would be to allow preregistration of buckets to be managed by data.all in this way not all buckets will be controlled by data.all. let me know your thoughts? |
Hi @manjulaK, thanks for the comment! I see your point, let's see. Our target state would be a pivot role with a policy allowing access or any S3 sensitive action to the data.all created buckets + imported buckets.
What do you think about option a)? |
hi @dlpzx thank you very much for looking into this. I think your option a) looks good. can you kindly confirm the following assumptions:
|
|
### Feature or Bugfix - Feature ### Detail The guiding principle is that: 1. dataset IAM role is the role accessing data 2. pivot role is the role used by the central account to perform SDK calls in the environment account In this PR we - Replace pivot role by dataset role in dataset Lake Formation registration - Use pivot role to trigger upload files feature and create folder feature, but use the dataset IAM role to perform the putObject operations-> removes the need for read and `putObject` permissions. for the pivot role - Redefine pivot role CDK stack to manage S3 buckets (bucket policies) for only the datasets S3 buckets that have been created or imported in the environment. - implement IAM policy utils to handle the new dynamic policies. We need to verify that the created policy statements do not exceed the maximum policy size. In addition we replace the previous "divide in chunks of 10 statements" by a function that divides in chunks based on the size of the policy statements. This way we optimize the policy size, which helps us in reducing the number of managed policies attached to the pivot role. --> it can be re-used in other "chunkenization" of policies - We did not implement force update of environments (pivot role nested stack) with new datasets added because it is already forced in `backend/dataall/modules/datasets/services/dataset_service.py` ### Backwards compatibility Testing Pre-update setup: - 1 environment (auto-created pivot role) - 2 datasets in environment, 1 created, 1 imported: with tables and folders - Run profiling jobs in tables Update with the branch changes: - [X] CICD pipeline runs successfully - [X] Click update environment on environment -> successfully updated policy of pivot role with imported datasets in policy. Reduction of policies - [X] Click update datasets --> registration in Lake formation updated to dataset role - [X] Update files works - [X] Create folder works - [X] Crawler and profiling jobs work ### Relates - #580 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Are you introducing any new policies/roles/users? `Yes` - Have you used the least-privilege principle? How? `In this PR we restrict the permissions of the pivot role, a super role that handles SDK calls in the environment accounts. Instead of granting permissions to all S3 buckets, we restrict it to data.all handled S3 buckets only` By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@dlpzx are you also updating this section in the pivotRole to to make sure that data.all only has access to imported KMS keys?
|
### Feature or Bugfix - Feature ### Detail - read KMS keys with an alias prefixed by the environment resource prefix - read KMS keys imported in imported datasets - restrict pivot role policies to the KMS keys created by data.all and those imported in the imported datasets - move kms client from data_sharing to base as it is used in environments and datasets ### Relates - #580 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). This PR restricts the IAM policies of the pivot role, following the least privilege permissions principle - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
#830 Also needed for this feature |
### Feature or Bugfix - Feature ### Detail - read KMS keys with an alias prefixed by the environment resource prefix - read KMS keys imported in imported datasets - restrict pivot role policies to the KMS keys created by data.all and those imported in the imported datasets - move kms client from data_sharing to base as it is used in environments and datasets ### Relates - data-dot-all#580 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). This PR restricts the IAM policies of the pivot role, following the least privilege permissions principle - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Merged and released with v2.1.0 🚀 |
🆕 [UPDATED WITH THE FEEDBACK FROM COMMENTS]
Is your idea related to a problem? Please describe.
Currently the data.all pivotRole requires permission to all S3 Buckets and KMS keys in the AWS account
Describe the solution you'd like
A solution in which access to the data on S3 buckets is restricted to specific roles only. We would like to prevent any data access from other accounts, especially if the pivot role gets compromised.
Analysis of roles in data.all
In data.all central account
In the environment accounts
In addition there is also the cdk execution role used for CDK deployments, but it is not relevant for this issue.
graphql-role
,worker-role
andecs-role
from the central accountpivot-Role
in the accountpivot-Role
in the accountIn this diagram we can see all roles and the SDK calls that they need to perform. It includes the changes that we want to implement.
Implementation
To be able to remove the S3 and KMS permissions from the pivotRole policies, we need to remove the need of these permissions and then modify the pivot role permissions. At the moment the pivot role is used to access data in the S3 Buckets in the following features:
Because imported dataset buckets can have any name, the S3 permissions granted apply to ALL buckets. We need to restrict the S3 permissions of the pivot role and the resources for those permissions.
1) As registration role in Lake Formation for the datasets
Because in V1.6 the dataset role was modified, we just need to:
Est. time ~ 2 days
2) In the Glue crawlers and profiling jobs as execution role
3) 4) In the upload and the create folder functionalities (see diagram above)
Option 1: Pivot role in all SDK calls
Option 2: Dataset role in SDK calls with access to S3
At first I was more inclined to go for option 1 as the pivot role has restricted access to managed buckets only needed to implement point 5). But after having a look at the code implementing option 2 is quite simple and avoids 'PutObject' permissions for the pivotRole
Est. time ~ 2 days
5) Manage bucket policies
In this case, it is the pivot role the one that needs to execute the updateBucketPolicy api calls. The task is to:
Est. time ---> not included in the initial estimations
More that we are not aware of
There might be other functionalities that access data from the backend of data.all that in theory seem unaffected, but as this is a big change could break.
E.g. preview data in data.all
Est. time ~ 2 days
In total adding the changes to the pivotRole policies [~2h] and additional testing time, the resulting estimated time to implement this enhancement will take ~ 8 days 🆕 + manage bucket policies
The text was updated successfully, but these errors were encountered: