-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IAM Policy enhancements - Split policy statements in chunks #884
Comments
Thanks for opening the issue for visibility @anushka-singh. We will work together on introducing this feature because as you noticed it includes some scenarios out of the ones considedred in the iam_policy_utils |
@dlpzx @SofiaSazonova @anmolsgandhi we ran into this issue on IAM policies when a role accumulated 38+ shares. Would be nice to prioritize this one. |
I started an investigation into it and here are my findings so far... When using data.all generated IAM roles we already use all the policies (max 10 per role) for services hence we cannot add anymore. We can assume that not all deployment will have all the features enabled and those 10 might be fewer and as such have some space but I don't consider this a good solution. When using consumption roles we don't have those service policies hence in theory we have space for adding more managed policies but keeping in mind that those roles are not data.all managed and hence users might have already added their own policies. Overall to tackle this issue we need to come up with a smarter (which probably means dynamic) policy manager which will maximise the capacity based on the following limitations
|
Hi @petrkalos , We are seeing IAM policy limit getting exceeded on a managed roles. I think an initial version where if the policy gets exceeded, another policy is created and attached would be helpful. I think data.all would somehow ( maybe ) have to remember the policy name it created and try to modify the policy which still has space left. FYI , if needed the limit can be pushed to 20 managed policies per role- https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html#reference_iam-quotas-entity-length:~:text=Managed%20policies%20per%20role . Please correct me if I am wrong |
[WIP] Current IAM Roles and Policy management in data.allData.all creates managed roles for data.all constructs like Environment, Dataset. These managed roles contain managed and inline policies.
Problem with managing Customer Managed Service policies:For environments and teams where roles are created by data.all and customer managed service policies are attached, these service policies are split into chunks ( splitting 10 statements into one managed policy ). Usually these service policies are split into managed policies as show below, Now if there is some modification in the 1-indexed policy then that leaves the managed policy some empty space in which it could fill in some statements. Data.all will have to rearrange all those policies so that there is a tight fit ( i.e. minimal empty space in all the policies ). Thus this problem is for statements = [s1, s2, s3, .... ] and their sizes ( weights ) = [size0, size1, size2, ... ]. Put statements in policies ( service-policy-0, service-policy-1, service-policy-2 ... ) such that Min(Empty Space left ). This problems exists because the service policies are diverse and dynamically added. This doesn't allow us to efficiently split it into chunks like how it is done in functions like ProposalStage 1:Since the problem of managed policies getting full and resulting in "limit exceeded error" most likely happens for roles which are involved in share purposes. This update can be first targeted to all the consumption roles, environment team roles ( which are involved in sharing ). Currently there exists just one managed policy with data.all with the naming convention - Policy updates with bucket sharing in data.all -Following resources are added in requestors IAM role SID for S3 - BucketStatementS3 S3 permissionss3_target_resources = [f'arn:aws:s3:::{self.bucket_name}', f'arn:aws:s3:::{self.bucket_name}/*'] SID for KMS - BucketStatementKMS Kms permissions ( if KMS is used to encrypt bucket )kms_target_resources = [f'arn:aws:kms:{self.bucket_region}:{self.source_account_id}:key/{kms_key_id}'] Policy updates with folder sharing in data.all -Following resources are added in requestors IAM role SID - AccessPointsStatementS3 S3 permissionss3_target_resources = [
f'arn:aws:s3:::{self.bucket_name}',
f'arn:aws:s3:::{self.bucket_name}/*',
f'arn:aws:s3:{self.dataset_region}:{self.dataset_account_id}:accesspoint/{self.access_point_name}',
f'arn:aws:s3:{self.dataset_region}:{self.dataset_account_id}:accesspoint/{self.access_point_name}/*',
] SID - AccessPointsStatementKMS Kms permissions ( if KMS is used to encrypt bucket )kms_target_resources = [f'arn:aws:kms:{self.dataset_region}:{self.dataset_account_id}:key/{kms_key_id}'] Policy updates with table sharing in data.all -Here the requestor IAM role is not updated Design to handle splitting of policiesWhen a share is gettting processed ( either for bucket or folder share )
for index, chunk in enumerate(statements_chunks):
policies.append(
iam.ManagedPolicy(
self.stack,
f'PivotRolePolicy-{index + 1}',
managed_policy_name=f'{self.env_resource_prefix}-pivot-role-cdk-policy-{self.region}-{index + 1}',
statements=chunk,
)
) When a share is revoked,
Migrating to managed policies with indexes.Since we already have a managed policy, this will be converted to an index based managed policy with the name format as - While attaching all the policies which are of the above format will be picked and attached to the role. QuestionsQ1. What happens when a role is not data.all managed ( in the case of consumption roles ) ?In case if the role is not managed by data.all, then all managed policies which are created as a part of the share will be added on the copy clipboard button. The policies will be presented as array of policies when copied from the clip board icon. Q2. Can we search for all the managed policies belonging to that consumption role ( any role which is involved in dataset share ) with boto3 calls ?Seems like the policies can be filtered based on Q3. What if we make separate managed policies for access point and bucket share ?This will involve migrating the existing policy into two managed policies like " Pros:
Cons
Q4. What are the policies which are added by Redshift sharing ?@dlpzx , any insights on this one ? Previous GH issues related to policy - |
Thanks for the detailed analysis @TejasRGitHub, I agree with the proposed solution, let's tackle ConsuptionRoles (and groups since it's very similar code) and think later about the rest. Also the indexing solution is already utilised and proven to work fine. |
Hi @TejasRGitHub, very clear design. Answering to Q4 ---> luckily for us we do not rely much on Redshift IAM permissions in the data.all integration. The only role that gets additional Redshift IAM permissions is the PivotRole, the rest will get access to Redshift using directly Redshift connections and Redshift permissions in the cluster (outside of IAM, managed by database admins) |
# Conflicts: # backend/dataall/modules/s3_datasets_shares/services/share_managers/s3_access_point_share_manager.py # backend/requirements.txt # docker-compose.yaml
# Conflicts: # deploy/stacks/container.py
Is your idea related to a problem? Please describe.
The limit for managed IAM policies is 6144 characters. Based on a dirty approximation, each team can request access to around 35 buckets based on the number of characters in policy statements.
Describe the solution you'd like
In v2.1.0 we added some utilities in dataall/base/utils/iam_policy_utils.py. We can reuse them to limit the resources per policy and splitting policies.
The text was updated successfully, but these errors were encountered: