Skip to content

Commit

Permalink
Feat: pivot role limit kms (#830)
Browse files Browse the repository at this point in the history
### Feature or Bugfix
- Feature

### Detail
- read KMS keys with an alias prefixed by the environment resource
prefix
- read KMS keys imported in imported datasets
- restrict pivot role policies to the KMS keys created by data.all and
those imported in the imported datasets
- move kms client from data_sharing to base as it is used in
environments and datasets

### Relates
- #580

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

This PR restricts the IAM policies of the pivot role, following the
least privilege permissions principle

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
dlpzx authored Oct 27, 2023
1 parent 8ad760b commit 3f100b4
Show file tree
Hide file tree
Showing 7 changed files with 191 additions and 74 deletions.
136 changes: 88 additions & 48 deletions backend/dataall/base/utils/iam_policy_utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import List
from typing import List, Callable
import logging
from aws_cdk import aws_iam as iam

Expand Down Expand Up @@ -43,61 +43,101 @@ def split_policy_statements_in_chunks(statements: List):
return chunks


def split_policy_with_resources_in_statements(base_sid, effect, actions, resources):
def split_policy_with_resources_in_statements(base_sid: str, effect: iam.Effect, actions: List[str], resources: List[str]):
"""
Splitter used for IAM policy statements with an undefined number of resources.
- Ensures that the size of the IAM statement is below the POLICY LIMIT
- If it exceeds the POLICY LIMIT, it breaks the statement in multiple statements with a subset of resources
- Note the POLICY_HEADERS_BUFFER to account for the headers of the policy which usually take around ~60chars
The variable part of the policy is in the resources parameter of the PolicyStatement
"""
statement_without_resources = iam.PolicyStatement(
sid=base_sid,
effect=effect,
actions=actions,
resources=["*"]
)
resources_str = '" ," '.join(r for r in resources)
number_resources = len(resources)
max_length = len(max(resources, key=len))
base_length = len(str(statement_without_resources.to_json()))
total_length = base_length + len(resources_str)
logger.info(f"Policy base length = {base_length}")
logger.info(f"Number of resources = {number_resources}, resource maximum length = {max_length}")
logger.info(f"Resources as string length = {len(resources_str)}")
logger.info(f"Total length approximated as base length + resources string length = {total_length}")
def _build_statement(split, subset):
return iam.PolicyStatement(
sid=base_sid + str(split),
effect=effect,
actions=actions,
resources=subset
)

total_length, base_length = _policy_analyzer(resources, _build_statement)
extra_chars = len('" ," ')

if total_length < POLICY_LIMIT - POLICY_HEADERS_BUFFER:
logger.info("Not exceeding policy limit, returning statement ...")
resulting_statement = iam.PolicyStatement(
sid=base_sid,
resulting_statement = _build_statement(1, resources)
return [resulting_statement]
else:
logger.info("Exceeding policy limit, splitting statement ...")
resulting_statements = _policy_splitter(base_length=base_length, resources=resources, extra_chars=extra_chars, statement_builder=_build_statement)
return resulting_statements


def split_policy_with_mutiple_value_condition_in_statements(base_sid: str, effect: iam.Effect, actions: List[str], resources: List[str], condition_dict: dict):
"""
The variable part of the policy is in the conditions parameter of the PolicyStatement
conditions_dict passes the different components of the condition mapping
"""
def _build_statement(split, subset):
return iam.PolicyStatement(
sid=base_sid + str(split),
effect=effect,
actions=actions,
resources=resources
resources=resources,
conditions={
condition_dict.get('key'): {
condition_dict.get('resource'): subset
}
}
)

total_length, base_length = _policy_analyzer(condition_dict.get('values'), _build_statement)
extra_chars = len(str(f'"Condition": {{ "{condition_dict.get("key")}": {{"{condition_dict.get("resource")}": }} }}'))

if total_length < POLICY_LIMIT - POLICY_HEADERS_BUFFER:
logger.info("Not exceeding policy limit, returning statement ...")
resulting_statement = _build_statement(1, condition_dict.get("values"))
return [resulting_statement]
else:
logger.info("Exceeding policy limit, splitting statement ...")
index = 0
split = 0
resulting_statements = []
while index < len(resources):
# Iterating until all resources are defined in a policy statement.
# "index" represents the position of the resource in the resources list
size = 0
res = []
while index < len(resources) and (size + len(resources[index]) + 5) < POLICY_LIMIT - POLICY_HEADERS_BUFFER - base_length:
# Appending a resource to the "res" list until we reach the maximum size for the resources section
# It compares: current size of resources versus the allowed size of the resource section in a statement
res.append(resources[index])
size += (len(resources[index]) + 5) # +5 for the 4 extra characters (", ") around each resource, plus additional ones []
index += 1
resulting_statement = iam.PolicyStatement(
sid=base_sid + str(split),
effect=effect,
actions=actions,
resources=res
)
split += 1
resulting_statements.append(resulting_statement)
logger.info(f"Statement divided into {split+1} smaller statements")
logger.info("Exceeding policy limit, splitting values ...")
resulting_statements = _policy_splitter(base_length=base_length, resources=condition_dict.get("values"), extra_chars=extra_chars, statement_builder=_build_statement)

return resulting_statements


def _policy_analyzer(resources: List[str], statement_builder: Callable[[int, List[str]], iam.PolicyStatement]):
"""
Calculates the policy size with the resources (total_length) and without resources (base_length)
"""
statement_without_resources = statement_builder(1, ["*"])
resources_str = '" ," '.join(r for r in resources)
base_length = len(str(statement_without_resources.to_json()))
total_length = base_length + len(resources_str)
logger.info(f"Policy base length = {base_length}")
logger.info(f"Resources as string length = {len(resources_str)}")
logger.info(f"Total length approximated as base length + resources string length = {total_length}")

return total_length, base_length


def _policy_splitter(base_length: int, resources: List[str], extra_chars: int, statement_builder: Callable[[int, List[str]], iam.PolicyStatement]):
"""
Splitter used for IAM policy statements with an undefined number of resources one of the parameters of the policy.
- Ensures that the size of the IAM statement is below the POLICY LIMIT
- If it exceeds the POLICY LIMIT, it breaks the statement in multiple statements with a subset of resources
- Note the POLICY_HEADERS_BUFFER to account for the headers of the policy which usually take around ~60chars
"""
index = 0
split = 0
resulting_statements = []
while index < len(resources):
# Iterating until all values are defined in a policy statement.
# "index" represents the position of the value in the values list
size = 0
subset = []
while index < len(resources) and (size + len(resources[index]) + extra_chars) < POLICY_LIMIT - POLICY_HEADERS_BUFFER - base_length:
# Appending a resource to the subset list until we reach the maximum size for the condition section
# It compares: current size of subset versus the allowed size of the condition section in a statement
subset.append(resources[index])
size += (len(resources[index]) + extra_chars)
index += 1
resulting_statement = statement_builder(split=split, subset=subset)
split += 1
resulting_statements.append(resulting_statement)
logger.info(f"Statement divided into {split+1} smaller statements")
return resulting_statements
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,40 @@ class KMSPivotRole(PivotRoleStatementSet):
"""
Class including all permissions needed by the pivot role to work with AWS KMS.
It allows pivot role to:
list and Describe KMS keys
manage data.all alias KMS keys
- ....
"""
def get_statements(self):
statements = [
iam.PolicyStatement(
sid='KMS',
sid='KMSList',
effect=iam.Effect.ALLOW,
actions=[
'kms:List*',
'kms:DescribeKey',
],
resources=['*'],
),
iam.PolicyStatement(
sid='KMSDataAllAlias',
effect=iam.Effect.ALLOW,
actions=[
'kms:Decrypt',
'kms:Encrypt',
'kms:GenerateDataKey*',
'kms:GetKeyPolicy',
'kms:PutKeyPolicy',
'kms:ReEncrypt*',
'kms:TagResource',
'kms:UntagResource',
],
resources=['*'],
),
iam.PolicyStatement(
sid='KMSList',
effect=iam.Effect.ALLOW,
actions=[
'kms:List*',
'kms:DescribeKey',
],
resources=['*'],
),
resources=[f"arn:aws:kms:{self.region}:{self.account}:key/*"],
conditions={
'ForAnyValue:StringLike': {
'kms:ResourceAliases': [f"alias/{self.env_resource_prefix}*"]
}
},
)
]
return statements
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ def update_dataset_bucket_key_policy(self):
'Updating dataset Bucket KMS key policy...'
)
key_alias = f"alias/{self.dataset.KmsAlias}"
kms_client = KmsClient(self.source_account_id, self.source_environment.region)
kms_client = KmsClient(account_id=self.source_account_id, region=self.source_environment.region)
kms_key_id = kms_client.get_key_id(key_alias)
existing_policy = kms_client.get_key_policy(kms_key_id)
target_requester_id = SessionHelper.get_role_id(self.target_account_id, self.target_requester_IAMRoleName)
Expand Down Expand Up @@ -392,7 +392,7 @@ def delete_dataset_bucket_key_policy(
'Deleting dataset bucket KMS key policy...'
)
key_alias = f"alias/{dataset.KmsAlias}"
kms_client = KmsClient(dataset.AwsAccountId, dataset.region)
kms_client = KmsClient(account_id=dataset.AwsAccountId, region=dataset.region)
kms_key_id = kms_client.get_key_id(key_alias)
existing_policy = kms_client.get_key_policy(kms_key_id)
target_requester_id = SessionHelper.get_role_id(target_environment.AwsAccountId, share.principalIAMRoleName)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def _set_allowed_kms_keys_statements(datasets):
dataset: Dataset
for dataset in datasets:
if dataset.imported and dataset.importedKmsKey:
key_id = KmsClient(dataset.AwsAccountId, dataset.region).get_key_id(
key_id = KmsClient(account_id=dataset.AwsAccountId, region=dataset.region).get_key_id(
key_alias=f"alias/{dataset.KmsAlias}"
)
if key_id:
Expand Down
49 changes: 41 additions & 8 deletions backend/dataall/modules/datasets/cdk/pivot_role_datasets_policy.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
from dataall.base import db
from dataall.base.utils.iam_policy_utils import split_policy_with_resources_in_statements
from dataall.base.utils.iam_policy_utils import split_policy_with_resources_in_statements, split_policy_with_mutiple_value_condition_in_statements
from dataall.core.environment.cdk.pivot_role_stack import PivotRoleStatementSet
from dataall.modules.datasets_base.db.dataset_repositories import DatasetRepository
from dataall.modules.datasets_base.db.dataset_models import Dataset
Expand All @@ -10,8 +10,13 @@
class DatasetsPivotRole(PivotRoleStatementSet):
"""
Class including all permissions needed by the pivot role to work with Datasets based in S3 and Glue databases
It allows pivot role to:
- ....
It allows pivot role access to:
- Athena workgroups for the environment teams
- All Glue catalog resources (governed by Lake Formation)
- Lake Formation
- Glue ETL for environment resources
- Imported Datasets' buckets
- Imported KMS keys alias
"""
def get_statements(self):
statements = [
Expand Down Expand Up @@ -126,7 +131,11 @@ def get_statements(self):
}
)
]
allowed_buckets = []
# Adding permissions for Imported Dataset S3 Buckets, created bucket permissions are added in core S3 permissions
# Adding permissions for Imported KMS keys
imported_buckets = []
imported_kms_alias = []

engine = db.get_engine(envname=os.environ.get('envname', 'local'))
with engine.scoped_session() as session:
datasets = DatasetRepository.query_environment_imported_datasets(
Expand All @@ -135,10 +144,11 @@ def get_statements(self):
if datasets:
dataset: Dataset
for dataset in datasets:
allowed_buckets.append(f'arn:aws:s3:::{dataset.S3BucketName}')
imported_buckets.append(f'arn:aws:s3:::{dataset.S3BucketName}')
if dataset.importedKmsKey:
imported_kms_alias.append(f'alias/{dataset.KmsAlias}')

if allowed_buckets:
# Imported Dataset S3 Buckets, created bucket permissions are added in core S3 permissions
if imported_buckets:
dataset_statement = split_policy_with_resources_in_statements(
base_sid='ImportedDatasetBuckets',
effect=iam.Effect.ALLOW,
Expand All @@ -152,7 +162,30 @@ def get_statements(self):
's3:PutObjectAcl',
's3:PutBucketOwnershipControls',
],
resources=allowed_buckets
resources=imported_buckets
)
statements.extend(dataset_statement)
if imported_kms_alias:
kms_statement = split_policy_with_mutiple_value_condition_in_statements(
base_sid='KMSImportedDataset',
effect=iam.Effect.ALLOW,
actions=[
'kms:Decrypt',
'kms:Encrypt',
'kms:GenerateDataKey*',
'kms:GetKeyPolicy',
'kms:PutKeyPolicy',
'kms:ReEncrypt*',
'kms:TagResource',
'kms:UntagResource',
],
resources=[f"arn:aws:kms:{self.region}:{self.account}:key/*"],
condition_dict={
"key": 'ForAnyValue:StringLike',
"resource": 'kms:ResourceAliases',
"values": imported_kms_alias
},
)
statements.extend(kms_statement)

return statements
6 changes: 3 additions & 3 deletions backend/dataall/modules/datasets/services/dataset_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from dataall.base.db import exceptions
from dataall.core.tasks.service_handlers import Worker
from dataall.base.aws.sts import SessionHelper
from dataall.modules.dataset_sharing.aws.kms_client import KmsClient
from dataall.base.context import get_context
from dataall.core.environment.env_permission_checker import has_group_permission
from dataall.core.environment.services.environment_service import EnvironmentService
Expand All @@ -17,7 +18,6 @@
from dataall.modules.catalog.db.glossary_repositories import GlossaryRepository
from dataall.modules.vote.db.vote_repositories import VoteRepository
from dataall.base.db.exceptions import AWSResourceNotFound, UnauthorizedOperation
from dataall.modules.dataset_sharing.aws.kms_client import KmsClient
from dataall.modules.dataset_sharing.db.share_object_models import ShareObject
from dataall.modules.dataset_sharing.db.share_object_repositories import ShareObjectRepository
from dataall.modules.dataset_sharing.services.share_permissions import SHARE_OBJECT_APPROVER
Expand Down Expand Up @@ -54,7 +54,7 @@ def check_dataset_account(session, environment):
def check_imported_resources(environment, data):
kms_alias = data.get('KmsKeyAlias')
if kms_alias not in [None, "Undefined", "", "SSE-S3"]:
key_exists = KmsClient(environment.AwsAccountId, environment.region).check_key_exists(
key_exists = KmsClient(account_id=environment.AwsAccountId, region=environment.region).check_key_exists(
key_alias=f"alias/{kms_alias}"
)
if not key_exists:
Expand All @@ -63,7 +63,7 @@ def check_imported_resources(environment, data):
message=f'KMS key with alias={kms_alias} cannot be found - Please check if KMS Key Alias exists in account {environment.AwsAccountId}',
)

key_id = KmsClient(environment.AwsAccountId, environment.region).get_key_id(
key_id = KmsClient(account_id=environment.AwsAccountId, region=environment.region).get_key_id(
key_alias=f"alias/{kms_alias}"
)
if not key_id:
Expand Down
36 changes: 36 additions & 0 deletions documentation/userguide/docs/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,42 @@ the fields of a newly created dataset you have to specify the S3 bucket and opti
is left empty, data.all will create a Glue database pointing at the S3 Bucket. As for the KMS key Alias, data.all assumes that if nothing is specified
the S3 Bucket is encrypted with SSE-S3 encryption.

!!! danger "Imported KMS key and S3 Bucket policies requirements"
Data.all pivot role will handle data sharing on the imported Bucket and KMS key (if imported). Make sure that
the resource policies allow the pivot role to manage them. For the KMS key policy, explicit permissions are needed. See an example below.


### KMS key policy
In the KMS key policy we need to grant explicit permission to the pivot role. Note that this block is needed even if
permissions for the principal `"AWS": "arn:aws:iam::111122223333:root"` are given.

```
{
"Sid": "Enable Pivot Role Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111122223333:role/dataallPivotRole-cdk"
},
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:PutKeyPolicy",
"kms:GetKeyPolicy",
"kms:ReEncrypt*",
"kms:TagResource",
"kms:UntagResource"
],
"Resource": "*"
}
```






| Field | Description | Required | Editable |Example
|------------------------|-------------------------------------------------------------------------------------------------|----------|----------|-------------
| Amazon S3 bucket name | Name of the S3 bucket you want to import | Yes | No |DOC-EXAMPLE-BUCKET
Expand Down

0 comments on commit 3f100b4

Please sign in to comment.