Feat: pivot role limit kms (#830)

### Feature or Bugfix - Feature ### Detail - read KMS keys with an alias prefixed by the environment resource prefix - read KMS keys imported in imported datasets - restrict pivot role policies to the KMS keys created by data.all and those imported in the imported datasets - move kms client from data_sharing to base as it is used in environments and datasets ### Relates - #580 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). This PR restricts the IAM policies of the pivot role, following the least privilege permissions principle - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
data-dot-all · Oct 27, 2023 · 3f100b4 · 3f100b4
1 parent 8ad760b
commit 3f100b4
Show file tree

Hide file tree

Showing 7 changed files with 191 additions and 74 deletions.
diff --git a/backend/dataall/base/utils/iam_policy_utils.py b/backend/dataall/base/utils/iam_policy_utils.py
@@ -1,4 +1,4 @@
-from typing import List
+from typing import List, Callable
 import logging
 from aws_cdk import aws_iam as iam
 
@@ -43,61 +43,101 @@ def split_policy_statements_in_chunks(statements: List):
     return chunks
 
 
-def split_policy_with_resources_in_statements(base_sid, effect, actions, resources):
+def split_policy_with_resources_in_statements(base_sid: str, effect: iam.Effect, actions: List[str], resources: List[str]):
     """
-    Splitter used for IAM policy statements with an undefined number of resources.
-    - Ensures that the size of the IAM statement is below the POLICY LIMIT
-    - If it exceeds the POLICY LIMIT, it breaks the statement in multiple statements with a subset of resources
-    - Note the POLICY_HEADERS_BUFFER to account for the headers of the policy which usually take around ~60chars
+    The variable part of the policy is in the resources parameter of the PolicyStatement
     """
-    statement_without_resources = iam.PolicyStatement(
-        sid=base_sid,
-        effect=effect,
-        actions=actions,
-        resources=["*"]
-    )
-    resources_str = '" ," '.join(r for r in resources)
-    number_resources = len(resources)
-    max_length = len(max(resources, key=len))
-    base_length = len(str(statement_without_resources.to_json()))
-    total_length = base_length + len(resources_str)
-    logger.info(f"Policy base length = {base_length}")
-    logger.info(f"Number of resources = {number_resources}, resource maximum length = {max_length}")
-    logger.info(f"Resources as string length = {len(resources_str)}")
-    logger.info(f"Total length approximated as base length + resources string length = {total_length}")
+    def _build_statement(split, subset):
+        return iam.PolicyStatement(
+            sid=base_sid + str(split),
+            effect=effect,
+            actions=actions,
+            resources=subset
+        )
+
+    total_length, base_length = _policy_analyzer(resources, _build_statement)
+    extra_chars = len('" ," ')
 
     if total_length < POLICY_LIMIT - POLICY_HEADERS_BUFFER:
         logger.info("Not exceeding policy limit, returning statement ...")
-        resulting_statement = iam.PolicyStatement(
-            sid=base_sid,
+        resulting_statement = _build_statement(1, resources)
+        return [resulting_statement]
+    else:
+        logger.info("Exceeding policy limit, splitting statement ...")
+        resulting_statements = _policy_splitter(base_length=base_length, resources=resources, extra_chars=extra_chars, statement_builder=_build_statement)
+    return resulting_statements
+
+
+def split_policy_with_mutiple_value_condition_in_statements(base_sid: str, effect: iam.Effect, actions: List[str], resources: List[str], condition_dict: dict):
+    """
+    The variable part of the policy is in the conditions parameter of the PolicyStatement
+    conditions_dict passes the different components of the condition mapping
+    """
+    def _build_statement(split, subset):
+        return iam.PolicyStatement(
+            sid=base_sid + str(split),
             effect=effect,
             actions=actions,
-            resources=resources
+            resources=resources,
+            conditions={
+                condition_dict.get('key'): {
+                    condition_dict.get('resource'): subset
+                }
+            }
         )
+
+    total_length, base_length = _policy_analyzer(condition_dict.get('values'), _build_statement)
+    extra_chars = len(str(f'"Condition":  {{ "{condition_dict.get("key")}": {{"{condition_dict.get("resource")}": }} }}'))
+
+    if total_length < POLICY_LIMIT - POLICY_HEADERS_BUFFER:
+        logger.info("Not exceeding policy limit, returning statement ...")
+        resulting_statement = _build_statement(1, condition_dict.get("values"))
         return [resulting_statement]
     else:
-        logger.info("Exceeding policy limit, splitting statement ...")
-        index = 0
-        split = 0
-        resulting_statements = []
-        while index < len(resources):
-            #  Iterating until all resources are defined in a policy statement.
-            #  "index" represents the position of the resource in the resources list
-            size = 0
-            res = []
-            while index < len(resources) and (size + len(resources[index]) + 5) < POLICY_LIMIT - POLICY_HEADERS_BUFFER - base_length:
-                #  Appending a resource to the "res" list until we reach the maximum size for the resources section
-                #  It compares: current size of resources versus the allowed size of the resource section in a statement
-                res.append(resources[index])
-                size += (len(resources[index]) + 5)  # +5 for the 4 extra characters (", ") around each resource, plus additional ones []
-                index += 1
-            resulting_statement = iam.PolicyStatement(
-                sid=base_sid + str(split),
-                effect=effect,
-                actions=actions,
-                resources=res
-            )
-            split += 1
-            resulting_statements.append(resulting_statement)
-        logger.info(f"Statement divided into {split+1} smaller statements")
+        logger.info("Exceeding policy limit, splitting values ...")
+        resulting_statements = _policy_splitter(base_length=base_length, resources=condition_dict.get("values"), extra_chars=extra_chars, statement_builder=_build_statement)
+
+    return resulting_statements
+
+
+def _policy_analyzer(resources: List[str], statement_builder: Callable[[int, List[str]], iam.PolicyStatement]):
+    """
+    Calculates the policy size with the resources (total_length) and without resources (base_length)
+    """
+    statement_without_resources = statement_builder(1, ["*"])
+    resources_str = '" ," '.join(r for r in resources)
+    base_length = len(str(statement_without_resources.to_json()))
+    total_length = base_length + len(resources_str)
+    logger.info(f"Policy base length = {base_length}")
+    logger.info(f"Resources as string length = {len(resources_str)}")
+    logger.info(f"Total length approximated as base length + resources string length = {total_length}")
+
+    return total_length, base_length
+
+
+def _policy_splitter(base_length: int, resources: List[str], extra_chars: int, statement_builder: Callable[[int, List[str]], iam.PolicyStatement]):
+    """
+    Splitter used for IAM policy statements with an undefined number of resources one of the parameters of the policy.
+    - Ensures that the size of the IAM statement is below the POLICY LIMIT
+    - If it exceeds the POLICY LIMIT, it breaks the statement in multiple statements with a subset of resources
+    - Note the POLICY_HEADERS_BUFFER to account for the headers of the policy which usually take around ~60chars
+    """
+    index = 0
+    split = 0
+    resulting_statements = []
+    while index < len(resources):
+        #  Iterating until all values are defined in a policy statement.
+        #  "index" represents the position of the value in the values list
+        size = 0
+        subset = []
+        while index < len(resources) and (size + len(resources[index]) + extra_chars) < POLICY_LIMIT - POLICY_HEADERS_BUFFER - base_length:
+            #  Appending a resource to the subset list until we reach the maximum size for the condition section
+            #  It compares: current size of subset versus the allowed size of the condition section in a statement
+            subset.append(resources[index])
+            size += (len(resources[index]) + extra_chars)
+            index += 1
+        resulting_statement = statement_builder(split=split, subset=subset)
+        split += 1
+        resulting_statements.append(resulting_statement)
+    logger.info(f"Statement divided into {split+1} smaller statements")
     return resulting_statements
diff --git a/backend/dataall/core/environment/cdk/pivot_role_core_policies/kms.py b/backend/dataall/core/environment/cdk/pivot_role_core_policies/kms.py
@@ -6,32 +6,40 @@ class KMSPivotRole(PivotRoleStatementSet):
     """
     Class including all permissions needed  by the pivot role to work with AWS KMS.
     It allows pivot role to:
+    list and Describe KMS keys
+    manage data.all alias KMS keys
     - ....
     """
     def get_statements(self):
         statements = [
             iam.PolicyStatement(
-                sid='KMS',
+                sid='KMSList',
+                effect=iam.Effect.ALLOW,
+                actions=[
+                    'kms:List*',
+                    'kms:DescribeKey',
+                ],
+                resources=['*'],
+            ),
+            iam.PolicyStatement(
+                sid='KMSDataAllAlias',
                 effect=iam.Effect.ALLOW,
                 actions=[
                     'kms:Decrypt',
                     'kms:Encrypt',
                     'kms:GenerateDataKey*',
+                    'kms:GetKeyPolicy',
                     'kms:PutKeyPolicy',
                     'kms:ReEncrypt*',
                     'kms:TagResource',
                     'kms:UntagResource',
                 ],
-                resources=['*'],
-            ),
-            iam.PolicyStatement(
-                sid='KMSList',
-                effect=iam.Effect.ALLOW,
-                actions=[
-                    'kms:List*',
-                    'kms:DescribeKey',
-                ],
-                resources=['*'],
-            ),
+                resources=[f"arn:aws:kms:{self.region}:{self.account}:key/*"],
+                conditions={
+                    'ForAnyValue:StringLike': {
+                        'kms:ResourceAliases': [f"alias/{self.env_resource_prefix}*"]
+                    }
+                },
+            )
         ]
         return statements
diff --git a/backend/dataall/modules/dataset_sharing/services/share_managers/s3_share_manager.py b/backend/dataall/modules/dataset_sharing/services/share_managers/s3_share_manager.py
@@ -281,7 +281,7 @@ def update_dataset_bucket_key_policy(self):
             'Updating dataset Bucket KMS key policy...'
         )
         key_alias = f"alias/{self.dataset.KmsAlias}"
-        kms_client = KmsClient(self.source_account_id, self.source_environment.region)
+        kms_client = KmsClient(account_id=self.source_account_id, region=self.source_environment.region)
         kms_key_id = kms_client.get_key_id(key_alias)
         existing_policy = kms_client.get_key_policy(kms_key_id)
         target_requester_id = SessionHelper.get_role_id(self.target_account_id, self.target_requester_IAMRoleName)
@@ -392,7 +392,7 @@ def delete_dataset_bucket_key_policy(
             'Deleting dataset bucket KMS key policy...'
         )
         key_alias = f"alias/{dataset.KmsAlias}"
-        kms_client = KmsClient(dataset.AwsAccountId, dataset.region)
+        kms_client = KmsClient(account_id=dataset.AwsAccountId, region=dataset.region)
         kms_key_id = kms_client.get_key_id(key_alias)
         existing_policy = kms_client.get_key_policy(kms_key_id)
         target_requester_id = SessionHelper.get_role_id(target_environment.AwsAccountId, share.principalIAMRoleName)

diff --git a/backend/dataall/modules/datasets/cdk/env_role_dataset_s3_policy.py b/backend/dataall/modules/datasets/cdk/env_role_dataset_s3_policy.py
@@ -75,7 +75,7 @@ def _set_allowed_kms_keys_statements(datasets):
             dataset: Dataset
             for dataset in datasets:
                 if dataset.imported and dataset.importedKmsKey:
-                    key_id = KmsClient(dataset.AwsAccountId, dataset.region).get_key_id(
+                    key_id = KmsClient(account_id=dataset.AwsAccountId, region=dataset.region).get_key_id(
                         key_alias=f"alias/{dataset.KmsAlias}"
                     )
                     if key_id:

diff --git a/backend/dataall/modules/datasets/cdk/pivot_role_datasets_policy.py b/backend/dataall/modules/datasets/cdk/pivot_role_datasets_policy.py
@@ -1,6 +1,6 @@
 import os
 from dataall.base import db
-from dataall.base.utils.iam_policy_utils import split_policy_with_resources_in_statements
+from dataall.base.utils.iam_policy_utils import split_policy_with_resources_in_statements, split_policy_with_mutiple_value_condition_in_statements
 from dataall.core.environment.cdk.pivot_role_stack import PivotRoleStatementSet
 from dataall.modules.datasets_base.db.dataset_repositories import DatasetRepository
 from dataall.modules.datasets_base.db.dataset_models import Dataset
@@ -10,8 +10,13 @@
 class DatasetsPivotRole(PivotRoleStatementSet):
     """
     Class including all permissions needed  by the pivot role to work with Datasets based in S3 and Glue databases
-    It allows pivot role to:
-    - ....
+    It allows pivot role access to:
+    - Athena workgroups for the environment teams
+    - All Glue catalog resources (governed by Lake Formation)
+    - Lake Formation
+    - Glue ETL for environment resources
+    - Imported Datasets' buckets
+    - Imported KMS keys alias
     """
     def get_statements(self):
         statements = [
@@ -126,7 +131,11 @@ def get_statements(self):
                 }
             )
         ]
-        allowed_buckets = []
+        # Adding permissions for Imported Dataset S3 Buckets, created bucket permissions are added in core S3 permissions
+        # Adding permissions for Imported KMS keys
+        imported_buckets = []
+        imported_kms_alias = []
+
         engine = db.get_engine(envname=os.environ.get('envname', 'local'))
         with engine.scoped_session() as session:
             datasets = DatasetRepository.query_environment_imported_datasets(
@@ -135,10 +144,11 @@ def get_statements(self):
             if datasets:
                 dataset: Dataset
                 for dataset in datasets:
-                    allowed_buckets.append(f'arn:aws:s3:::{dataset.S3BucketName}')
+                    imported_buckets.append(f'arn:aws:s3:::{dataset.S3BucketName}')
+                    if dataset.importedKmsKey:
+                        imported_kms_alias.append(f'alias/{dataset.KmsAlias}')
 
-        if allowed_buckets:
-            # Imported Dataset S3 Buckets, created bucket permissions are added in core S3 permissions
+        if imported_buckets:
             dataset_statement = split_policy_with_resources_in_statements(
                 base_sid='ImportedDatasetBuckets',
                 effect=iam.Effect.ALLOW,
@@ -152,7 +162,30 @@ def get_statements(self):
                     's3:PutObjectAcl',
                     's3:PutBucketOwnershipControls',
                 ],
-                resources=allowed_buckets
+                resources=imported_buckets
             )
             statements.extend(dataset_statement)
+        if imported_kms_alias:
+            kms_statement = split_policy_with_mutiple_value_condition_in_statements(
+                base_sid='KMSImportedDataset',
+                effect=iam.Effect.ALLOW,
+                actions=[
+                    'kms:Decrypt',
+                    'kms:Encrypt',
+                    'kms:GenerateDataKey*',
+                    'kms:GetKeyPolicy',
+                    'kms:PutKeyPolicy',
+                    'kms:ReEncrypt*',
+                    'kms:TagResource',
+                    'kms:UntagResource',
+                ],
+                resources=[f"arn:aws:kms:{self.region}:{self.account}:key/*"],
+                condition_dict={
+                    "key": 'ForAnyValue:StringLike',
+                    "resource": 'kms:ResourceAliases',
+                    "values": imported_kms_alias
+                },
+            )
+            statements.extend(kms_statement)
+
         return statements
diff --git a/backend/dataall/modules/datasets/services/dataset_service.py b/backend/dataall/modules/datasets/services/dataset_service.py
@@ -5,6 +5,7 @@
 from dataall.base.db import exceptions
 from dataall.core.tasks.service_handlers import Worker
 from dataall.base.aws.sts import SessionHelper
+from dataall.modules.dataset_sharing.aws.kms_client import KmsClient
 from dataall.base.context import get_context
 from dataall.core.environment.env_permission_checker import has_group_permission
 from dataall.core.environment.services.environment_service import EnvironmentService
@@ -17,7 +18,6 @@
 from dataall.modules.catalog.db.glossary_repositories import GlossaryRepository
 from dataall.modules.vote.db.vote_repositories import VoteRepository
 from dataall.base.db.exceptions import AWSResourceNotFound, UnauthorizedOperation
-from dataall.modules.dataset_sharing.aws.kms_client import KmsClient
 from dataall.modules.dataset_sharing.db.share_object_models import ShareObject
 from dataall.modules.dataset_sharing.db.share_object_repositories import ShareObjectRepository
 from dataall.modules.dataset_sharing.services.share_permissions import SHARE_OBJECT_APPROVER
@@ -54,7 +54,7 @@ def check_dataset_account(session, environment):
     def check_imported_resources(environment, data):
         kms_alias = data.get('KmsKeyAlias')
         if kms_alias not in [None, "Undefined", "", "SSE-S3"]:
-            key_exists = KmsClient(environment.AwsAccountId, environment.region).check_key_exists(
+            key_exists = KmsClient(account_id=environment.AwsAccountId, region=environment.region).check_key_exists(
                 key_alias=f"alias/{kms_alias}"
             )
             if not key_exists:
@@ -63,7 +63,7 @@ def check_imported_resources(environment, data):
                     message=f'KMS key with alias={kms_alias} cannot be found - Please check if KMS Key Alias exists in account {environment.AwsAccountId}',
                 )
 
-            key_id = KmsClient(environment.AwsAccountId, environment.region).get_key_id(
+            key_id = KmsClient(account_id=environment.AwsAccountId, region=environment.region).get_key_id(
                 key_alias=f"alias/{kms_alias}"
             )
             if not key_id:

diff --git a/documentation/userguide/docs/datasets.md b/documentation/userguide/docs/datasets.md
@@ -101,6 +101,42 @@ the fields of a newly created dataset you have to specify the S3 bucket and opti
 is left empty, data.all will create a Glue database pointing at the S3 Bucket. As for the KMS key Alias, data.all assumes that if nothing is specified
 the S3 Bucket is encrypted with SSE-S3 encryption.
 
+!!! danger "Imported KMS key and S3 Bucket policies requirements"
+    Data.all pivot role will handle data sharing on the imported Bucket and KMS key (if imported). Make sure that
+    the resource policies allow the pivot role to manage them. For the KMS key policy, explicit permissions are needed. See an example below.
+
+
+### KMS key policy
+In the KMS key policy we need to grant explicit permission to the pivot role. Note that this block is needed even if
+permissions for the principal `"AWS": "arn:aws:iam::111122223333:root"` are given.
+
+```
+{
+  "Sid": "Enable Pivot Role Permissions",
+  "Effect": "Allow",
+  "Principal": {
+    "AWS": "arn:aws:iam::111122223333:role/dataallPivotRole-cdk"
+   },
+  "Action": [
+    "kms:Decrypt",
+    "kms:Encrypt",
+    "kms:GenerateDataKey*",
+    "kms:PutKeyPolicy",
+    "kms:GetKeyPolicy",
+    "kms:ReEncrypt*",
+    "kms:TagResource",
+    "kms:UntagResource"
+   ],
+  "Resource": "*"
+}
+
+```
+
+
+
+
+
+
 | Field                  | Description                                                                                     | Required | Editable |Example
 |------------------------|-------------------------------------------------------------------------------------------------|----------|----------|-------------
 | Amazon S3 bucket name  | Name of the S3 bucket you want to import                                                        | Yes      | No    |DOC-EXAMPLE-BUCKET