Skip to content

Commit

Permalink
Squashed commit of the following
Browse files Browse the repository at this point in the history
commit df87bb5a 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Wed Aug 09 2023 13:50:41 GMT-0400 (Eastern Daylight Time) 

    Merge branch 'test2' into origin/open-source


commit 554d74e 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Wed Aug 09 2023 12:42:19 GMT-0400 (Eastern Daylight Time) 

    Cosmetic Changes to Linking Env Frontend Steps


commit b91b157 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Wed Aug 09 2023 13:40:45 GMT-0400 (Eastern Daylight Time) 

    Linting


commit 9b2a85b 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Wed Aug 09 2023 11:10:12 GMT-0400 (Eastern Daylight Time) 

    Resolve S3 Permissions Nested Stack CDK Exec Role


commit e567eab 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Wed Aug 09 2023 13:37:05 GMT-0400 (Eastern Daylight Time) 

    Glue Profiling Job Fixes


commit c678e67 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Fri Aug 04 2023 13:27:53 GMT-0400 (Eastern Daylight Time) 

    Allow restricted nacls backend VPC (#626)

### Feature or Bugfix
- Feature


### Detail
- Extend the restricted NACLs parameter to allow for both the tooling
VPC and the backend VPC


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit f235c19 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Tue Aug 08 2023 11:04:05 GMT-0400 (Eastern Daylight Time) 

    Handle External ID SSM v1.6.1> (#630)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- As part of v1.6 Data.All moved away from storing the externalID as a
rotated secret in Secret Manager and instead placed the external ID in
SSM Parameter Store.
- In the current implementation in v1.6.1 we check if the secret exists
and the ssm parameter does not and if these conditions are met the
secret value is retrieved and a new ssm parameter is set with the same
externalID
- The problem with the above is CDK uses dynamic references to resolve
the secret value (meaning in the first upgrade deployment we set ssm
parameter as ref to secret value and delete secret, in 2nd and so one
deployments it will fail with `Secrets Manager can't find the specified
secret.`)

- Alternatively we can not use the CDK bootstrap role, such as the look
up role, and boto3 SDK commands to retrieve the secret value during
`synth` because IAM permissions out of the box do not allow said actions
- This would theoretically be a way to overcome the dynamic reference
issue mentioned above

- This PR reverts to a more straightforward approach where we create a
new SSM Parameter if one does not exist already for the external ID and
does not reference the previously created secret externalID
- NOTE: In order to keep the same externalID and prevent additional
manual work to update the pivotRole's using this value one would have to
- retain the current externalID in Secret Manager (named
`dataall-externalId-{envname}`) from version <= 1.5X
    - Run the upgrade to v1.6.1
- Replace the newly created SSM (parameter named
`/dataall/{envname}/pivotRole/externalId"`) with the original value for
external ID


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit f0a932f 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Tue Aug 08 2023 03:30:40 GMT-0400 (Eastern Daylight Time) 

    get prefix list ids for dbmigration for infra region (#624)

### Feature or Bugfix
- Bugfix

### Detail
- get the prefix id list for S3 from the infra region. We need the
prefix id to connect the dbmigration stage with the S3 bucket containing
the migration scripts (add it in the security groups)

### Relates
- #618 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 8900ebf 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Tue Aug 08 2023 03:30:06 GMT-0400 (Eastern Daylight Time) 

    resolve unnecessary dependency in git_release role (#623)

### Feature or Bugfix
- Bugfix

### Detail
- Remove small bug on the way we define the git release role - managed
policies are attached after role creation
- NOTE: The fix is already included in the `modularization-main` branch

### Relates
-  #617 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
noah-paige committed Aug 9, 2023
1 parent 63e3d4f commit 53fc84f
Show file tree
Hide file tree
Showing 10 changed files with 132 additions and 97 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ def on_create(event):
except ClientError as e:
pass

default_db_exists = False
try:
glue_client.get_database(Name="default")
exists = True
except ClientError as e:
pass

if not exists:
try:
db_input = props.get('DatabaseInput').copy()
Expand All @@ -63,7 +70,7 @@ def on_create(event):
raise Exception(f"Could not create Glue Database {props['DatabaseInput']['Name']} in aws://{AWS_ACCOUNT}/{AWS_REGION}, received {str(e)}")

Entries = []
for i, role_arn in enumerate(props.get('DatabaseAdministrators')):
for i, role_arn in enumerate(props.get('DatabaseAdministrators', [])):
Entries.append(
{
'Id': str(uuid.uuid4()),
Expand Down Expand Up @@ -103,6 +110,20 @@ def on_create(event):
'PermissionsWithGrantOption': ['SELECT', 'ALTER', 'DESCRIBE'],
}
)
if default_db_exists:
Entries.append(
{
'Id': str(uuid.uuid4()),
'Principal': {'DataLakePrincipalIdentifier': role_arn},
'Resource': {
'Database': {
'Name': 'default'
}
},
'Permissions': ['Describe'.upper()],
}
)

lf_client.batch_grant_permissions(CatalogId=props['CatalogId'], Entries=Entries)
physical_id = props['DatabaseInput']['Imported'] + props['DatabaseInput']['Name']

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import json
import os
import logging
import pprint
import sys
Expand All @@ -8,7 +9,6 @@
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pydeequ.profiles import *

sc = SparkContext.getOrCreate()
sc._jsc.hadoopConfiguration().set('fs.s3.canned.acl', 'BucketOwnerFullControl')
Expand All @@ -32,6 +32,7 @@
'environmentBucket',
'dataallRegion',
'table',
"SPARK_VERSION"
]
try:
args = getResolvedOptions(sys.argv, list_args)
Expand All @@ -43,6 +44,10 @@
list_args.remove('table')
args = getResolvedOptions(sys.argv, list_args)

os.environ["SPARK_VERSION"] = args.get("SPARK_VERSION", "3.1")

from pydeequ.profiles import *

logger.info('Parsed Retrieved parameters')

logger.info('Parsed Args = %s', pprint.pformat(args))
Expand Down
7 changes: 5 additions & 2 deletions backend/dataall/cdkproxy/stacks/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,24 +295,26 @@ def __init__(self, scope, id, target_uri: str = None, **kwargs):
]
),
iam.PolicyStatement(
sid="CreateLoggingGlueCrawler",
sid="CreateLoggingGlue",
actions=[
'logs:CreateLogGroup',
'logs:CreateLogStream',
],
effect=iam.Effect.ALLOW,
resources=[
f'arn:aws:logs:{dataset.region}:{dataset.AwsAccountId}:log-group:/aws-glue/crawlers*',
f'arn:aws:logs:{dataset.region}:{dataset.AwsAccountId}:log-group:/aws-glue/jobs/*',
],
),
iam.PolicyStatement(
sid="LoggingGlueCrawler",
sid="LoggingGlue",
actions=[
'logs:PutLogEvents',
],
effect=iam.Effect.ALLOW,
resources=[
f'arn:aws:logs:{dataset.region}:{dataset.AwsAccountId}:log-group:/aws-glue/crawlers:log-stream:{dataset.GlueCrawlerName}',
f'arn:aws:logs:{dataset.region}:{dataset.AwsAccountId}:log-group:/aws-glue/jobs/*',
],
),
iam.PolicyStatement(
Expand Down Expand Up @@ -484,6 +486,7 @@ def __init__(self, scope, id, target_uri: str = None, **kwargs):
'--enable-metrics': 'true',
'--enable-continuous-cloudwatch-log': 'true',
'--enable-glue-datacatalog': 'true',
'--SPARK_VERSION': '3.1',
}

job = glue.CfnJob(
Expand Down
37 changes: 17 additions & 20 deletions deploy/cdk_exec_policy/cdkExecPolicy.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
AWSTemplateFormatVersion: 2010-09-09
Description: Custom least privilege IAM policy for linking environments to dataall
Parameters:
AwsAccountId:
Description: AWS AccountId of the account that we wish to link.
Type: String
PolicyName:
Description: IAM policy name (The same name must be used during CDK bootstrapping. Default is DataAllCustomCDKPolicy.)
Type: String
Expand Down Expand Up @@ -48,14 +45,14 @@ Resources:
Effect: Allow
Action: 'athena:CreateWorkGroup'
Resource:
- !Sub 'arn:aws:athena:*:${AWS::AccountId}:workgroup/*'
- !Sub 'arn:${AWS::Partition}:athena:*:${AWS::AccountId}:workgroup/*'
- Sid: IAM
Action:
- 'iam:CreatePolicy'
- 'iam:GetPolicy'
Effect: Allow
Resource:
- !Sub 'arn:aws:iam::${AWS::AccountId}:policy/*'
- !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/*'
- Sid: IAMRole
Action:
- 'iam:AttachRolePolicy'
Expand All @@ -82,7 +79,7 @@ Resources:
- 'iam:CreatePolicyVersion'
- 'iam:DeletePolicyVersion'
Resource:
- !Sub 'arn:aws:iam::${AWS::AccountId}:policy/service-role/AWSQuickSight*'
- !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/service-role/AWSQuickSight*'
- Sid: QuickSight
Effect: Allow
Action:
Expand Down Expand Up @@ -114,14 +111,14 @@ Resources:
- 'kms:CreateAlias'
Effect: Allow
Resource:
- !Sub 'arn:aws:kms:*:${AWS::AccountId}:alias/*'
- !Sub 'arn:${AWS::Partition}:kms:*:${AWS::AccountId}:alias/*'
- Sid: KMSKey
Action:
- 's3:PutBucketAcl'
- 's3:PutBucketNotification'
Effect: Allow
Resource:
- !Sub 'arn:aws:s3:::${EnvironmentResourcePrefix}-logging-*'
- !Sub 'arn:${AWS::Partition}:s3:::${EnvironmentResourcePrefix}-logging-*'
- Sid: ReadBuckets
Action:
- 'kms:CreateAlias'
Expand All @@ -136,7 +133,7 @@ Resources:
- 'kms:PutKeyPolicy'
- 'kms:TagResource'
Effect: Allow
Resource: !Sub 'arn:aws:kms:*:${AWS::AccountId}:key/*'
Resource: !Sub 'arn:${AWS::Partition}:kms:*:${AWS::AccountId}:key/*'
- Sid: Lambda
Action:
- 'lambda:AddPermission'
Expand All @@ -154,7 +151,7 @@ Resources:
Action:
- 'lambda:PublishLayerVersion'
Resource:
- !Sub 'arn:aws:lambda:*:${AWS::AccountId}:layer:*'
- !Sub 'arn:${AWS::Partition}:lambda:*:${AWS::AccountId}:layer:*'
- Sid: S3
Action:
- 's3:CreateBucket'
Expand All @@ -170,13 +167,13 @@ Resources:
- 's3:DeleteBucketPolicy'
- 's3:DeleteBucket'
Effect: Allow
Resource: 'arn:aws:s3:::*'
Resource: !Sub 'arn:${AWS::Partition}:s3:::*'
- Sid: SQS
Effect: Allow
Action:
- 'sqs:CreateQueue'
- 'sqs:SetQueueAttributes'
Resource: !Sub 'arn:aws:sqs:*:${AWS::AccountId}:*'
Resource: !Sub 'arn:${AWS::Partition}:sqs:*:${AWS::AccountId}:*'
- Sid: SSM
Effect: Allow
Action:
Expand All @@ -190,18 +187,18 @@ Resources:
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
- 'logs:DescribeLogStreams'
Resource: 'arn:aws:logs:*:*:*'
Resource: !Sub 'arn:${AWS::Partition}:logs:*:*:*'
- Sid: STS
Effect: Allow
Action:
- 'sts:AssumeRole'
- 'iam:*Role*'
Resource: !Sub 'arn:aws:iam::${AWS::AccountId}:role/cdk-*'
Resource: !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-*'
- Sid: CloudFormation
Effect: Allow
Action:
- 'cloudformation:*'
Resource: !Sub 'arn:aws:cloudformation:*:${AWS::AccountId}:stack/CDKToolkit/*'
Resource: !Sub 'arn:${AWS::Partition}:cloudformation:*:${AWS::AccountId}:stack/CDKToolkit/*'
- Sid: ECR
Effect: Allow
Action:
Expand All @@ -211,14 +208,14 @@ Resources:
- 'ecr:DescribeRepositories'
- 'ecr:CreateRepository'
- 'ecr:DeleteRepository'
Resource: !Sub 'arn:aws:ecr:*:${AWS::AccountId}:repository/cdk-*'
Resource: !Sub 'arn:${AWS::Partition}:ecr:*:${AWS::AccountId}:repository/cdk-*'
- Sid: SSMTwo
Effect: Allow
Action:
- 'ssm:GetParameter'
- 'ssm:PutParameter'
- 'ssm:DeleteParameter'
Resource: !Sub 'arn:aws:ssm:*:${AWS::AccountId}:parameter/cdk-bootstrap/*'
Resource: !Sub 'arn:${AWS::Partition}:ssm:*:${AWS::AccountId}:parameter/cdk-bootstrap/*'
- Sid: CloudformationTwo
Effect: Allow
Action:
Expand All @@ -232,7 +229,7 @@ Resources:
Action:
- 's3:*'
Resource:
- !Sub 'arn:aws:s3:::cdktoolkit-stagingbucket-*'
- !Sub 'arn:${AWS::Partition}:s3:::cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}*'
- Sid: Pipelines
Effect: Allow
Action:
Expand Down Expand Up @@ -261,15 +258,15 @@ Resources:
- 's3:ListBucket'
- 's3:GetBucketPolicy'
Resource:
- 'arn:aws:s3::*:codepipeline-*'
- !Sub 'arn:${AWS::Partition}:s3::*:codepipeline-*'
- Sid: CodeStarNotificationsReadOnly
Effect: Allow
Action:
- 'codestar-notifications:DescribeNotificationRule'
Resource: '*'
Condition:
'StringLike':
'codestar-notifications:NotificationsForResource': 'arn:aws:codepipeline:*'
'codestar-notifications:NotificationsForResource': !Sub 'arn:${AWS::Partition}:codepipeline:*'
- Sid: Eventrules
Effect: Allow
Action:
Expand Down
9 changes: 4 additions & 5 deletions deploy/stacks/backend_stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def __init__(
id,
envname: str = 'dev',
resource_prefix='dataall',
tooling_region=None,
tooling_account_id=None,
ecr_repository=None,
image_tag=None,
Expand Down Expand Up @@ -72,7 +71,7 @@ def __init__(
vpc = self.vpc_stack.vpc
vpc_endpoints_sg = self.vpc_stack.vpce_security_group
vpce_connection = ec2.Connections(security_groups=[vpc_endpoints_sg])
self.s3_prefix_list = self.get_s3_prefix_list(tooling_region)
self.s3_prefix_list = self.get_s3_prefix_list()

self.pivot_role_name = f"dataallPivotRole{'-cdk' if enable_pivot_role_auto_create else ''}"

Expand Down Expand Up @@ -362,13 +361,13 @@ def create_opensearch_serverless_stack(self):
collection_name=aoss_stack.collection_name,
)

def get_s3_prefix_list(self, tooling_region):
ec2_client = boto3.client("ec2", region_name=tooling_region)
def get_s3_prefix_list(self):
ec2_client = boto3.client("ec2", region_name=self.region)
response = ec2_client.describe_prefix_lists(
Filters=[
{
'Name': 'prefix-list-name',
'Values': [f'com.amazonaws.{tooling_region}.s3']
'Values': [f'com.amazonaws.{self.region}.s3']
},
]
)
Expand Down
2 changes: 0 additions & 2 deletions deploy/stacks/backend_stage.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ def __init__(
resource_prefix='dataall',
ecr_repository=None,
commit_id=None,
tooling_region=None,
tooling_account_id=None,
pipeline_bucket=None,
vpc_id=None,
Expand Down Expand Up @@ -42,7 +41,6 @@ def __init__(
f'backend-stack',
envname=envname,
resource_prefix=resource_prefix,
tooling_region=tooling_region,
tooling_account_id=tooling_account_id,
ecr_repository=ecr_repository,
pipeline_bucket=pipeline_bucket,
Expand Down
32 changes: 12 additions & 20 deletions deploy/stacks/param_store_stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,9 @@ def __init__(
)

def _get_external_id_value(envname, account_id, region):
"""For first deployments it returns False,
for existing deployments it returns the ssm parameter value generated in the first deployment
for prior to V1.5.1 upgrades it returns the secret from secrets manager
"""
For first deployments and upgrades from <=V1.5.6 to >=v1.6 - returns False and a new ssm parameter created,
For existing >=v1.6 deployments - returns the ssm parameter value generated in the first deployment
"""
cdk_look_up_role = 'arn:aws:iam::{}:role/cdk-hnb659fds-lookup-role-{}-{}'.format(account_id, account_id, region)
base_session = boto3.Session()
Expand All @@ -130,29 +130,21 @@ def _get_external_id_value(envname, account_id, region):
region_name=region,
endpoint_url=f"https://sts.{region}.amazonaws.com"
)
response = sts.assume_role(**assume_role_dict)
session = boto3.Session(
aws_access_key_id=response['Credentials']['AccessKeyId'],
aws_secret_access_key=response['Credentials']['SecretAccessKey'],
aws_session_token=response['Credentials']['SessionToken'],
)

secret_id = f"dataall-externalId-{envname}"
parameter_path = f"/dataall/{envname}/pivotRole/externalId"

try:
response = sts.assume_role(**assume_role_dict)
session = boto3.Session(
aws_access_key_id=response['Credentials']['AccessKeyId'],
aws_secret_access_key=response['Credentials']['SecretAccessKey'],
aws_session_token=response['Credentials']['SessionToken'],
)
ssm_client = session.client('ssm', region_name=region)
parameter_value = ssm_client.get_parameter(Name=parameter_path)['Parameter']['Value']
return parameter_value
except:
try:
secrets_client = session.client('secretsmanager', region_name=region)
if secrets_client.describe_secret(SecretId=secret_id):
secret_value = SecretValue.secrets_manager(secret_id).unsafe_unwrap()
else:
raise Exception
return secret_value
except:
return False
return False


def _generate_external_id():
allowed_chars = string.ascii_uppercase + string.ascii_lowercase + string.digits
Expand Down
2 changes: 0 additions & 2 deletions deploy/stacks/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,6 @@ def set_codebuild_iam_roles(self):
iam.ServicePrincipal('codebuild.amazonaws.com'),
iam.AccountPrincipal(self.account),
),
managed_policies=[self.baseline_codebuild_policy, self.git_release_policy, self.expanded_codebuild_policy]
)
self.expanded_codebuild_policy.attach_to_role(self.git_project_role)
self.baseline_codebuild_policy.attach_to_role(self.git_project_role)
Expand Down Expand Up @@ -597,7 +596,6 @@ def set_backend_stage(self, target_env, repository_name):
},
envname=target_env['envname'],
resource_prefix=self.resource_prefix,
tooling_region=self.region,
tooling_account_id=self.account,
pipeline_bucket=self.pipeline_bucket_name,
ecr_repository=f'arn:aws:ecr:{target_env.get("region", self.region)}:{self.account}:repository/{repository_name}',
Expand Down
Loading

0 comments on commit 53fc84f

Please sign in to comment.