Skip to content

Commit

Permalink
Fix external forks for CDK nag (#767)
Browse files Browse the repository at this point in the history
### Feature or Bugfix
- Bugfix

### Detail
See issue #766 
Alternatives considered:
1. use [ok-to-test](https://github.com/imjohnbo/ok-to-test) --> requires
authentication, preferred is GitHub App (not possible for data.all),
other options: personal access token or OAuth app token
2. run cdk-nag action only on minor release branches (v2m2m0 to main)
branch
3. run cdk-nag on `pull_request_target` after the PR is merged
4. run cdk-nag on schedule
5. Use other than OIDC but I think the issue could still be there as it
has to due with permissions on the repo
6. Avoid the need for credentials

This last one is the cleanest and safest. We need to mock the context of
the cdk app either:
- passing context as part of the CLI command `cdk synth --context
key=value` --> not possible as we need to pass more complex params
- creating a json object in the CLI --> cumbersome
- CHOSEN: pass the context directly in the declaration of the `App`

In addition other changes had to be made:
- remove need for SSM calls in app.py if GithubActions are running
- try/except in getting the S3 prefixes

⚠️ STILL NEEDS TESTING

### Relates
#766 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
dlpzx authored Sep 19, 2023
1 parent 9885da2 commit 9348535
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 41 deletions.
14 changes: 5 additions & 9 deletions .github/workflows/cdk-nag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ on:
- main

permissions:
id-token: write
contents: read

jobs:
Expand All @@ -23,17 +22,13 @@ jobs:
matrix:
python-version: [3.8]
env:
AWS_DEFAULT_REGION: us-east-1
CDK_DEFAULT_REGION: eu-west-1
CDK_DEFAULT_ACCOUNT: 111111111111
GITHUB_ACTIONS: true
runs-on: ubuntu-latest
steps:
- name: Git clone
uses: actions/checkout@v3
- name: configure aws credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::109658928441:role/aws-dataall-github-actions
role-session-name: dataall-gh-act
aws-region: ${{ env.AWS_DEFAULT_REGION }}
- name: Set up Node.js
uses: actions/setup-node@v3
- name: Install CDK
Expand All @@ -49,4 +44,5 @@ jobs:
- name: Install Requirements
run: python -m pip install -r deploy/requirements.txt
- name: CDK Synth
run: npx cdk synth
run: |
npx cdk synth
1 change: 0 additions & 1 deletion .github/workflows/semgrep-schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
workflow_dispatch:
branches:
- main
- main-v2
schedule:
- cron: '0 1 * * 2'

Expand Down
51 changes: 29 additions & 22 deletions deploy/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import logging
import os
import subprocess
import re

import boto3
import botocore
Expand All @@ -17,11 +18,6 @@
logging.basicConfig(level=logging.INFO, format=LOGGING_FORMAT)
logger = logging.getLogger(__name__)

ssmc = boto3.client('ssm')

account_id = boto3.client('sts').get_caller_identity().get('Account') or os.getenv(
'CDK_DEFAULT_ACCOUNT'
)

if not os.environ.get("DATAALL_REPO_BRANCH", None):
# Configuration of the branch in first deployment
Expand All @@ -33,25 +29,36 @@
# Configuration of the branch in subsequent deployments
git_branch = os.environ.get("DATAALL_REPO_BRANCH")

git_branch = git_branch if git_branch != "" else "main"

git_branch = re.sub('[^a-zA-Z0-9-_]', '', git_branch)[:12] if git_branch != "" else "main"

# Configuration of the cdk.json SSM or in Repository
try:
logger.info("Trying to get cdkjson parameter from SSM")
response = ssmc.get_parameter(Name=f"/dataall/{git_branch}/cdkjson")
cdkjson = json.loads(response['Parameter']['Value']).get('context')

app = App(context=cdkjson)
logger.info("Loaded context from SSM")

except (ssmc.exceptions.ParameterNotFound, botocore.exceptions.ClientError) as err:
if isinstance(err, ssmc.exceptions.ParameterNotFound):
logger.warning("SSM parameter not found - Proceeding with cdk.json and cdk.context.json in code")
else:
logger.error(err)

app = App()
logger.info("Loaded context from cdk.json file in repository")
if os.getenv("GITHUB_ACTIONS"):
logger.info("Running GitHub Actions")
account_id = os.getenv('CDK_DEFAULT_ACCOUNT')
app = App(context={
"availability-zones:account=111111111111:region=eu-west-1": ["eu-west-1a","eu-west-1b","eu-west-1c"],
"availability-zones:account=111111111111:region=us-east-1": ["us-east-1a","us-east-1b","us-east-1c"]
})
else:
account_id = boto3.client('sts').get_caller_identity().get('Account') or os.getenv('CDK_DEFAULT_ACCOUNT')
try:
logger.info("Trying to get cdkjson parameter from SSM")
ssmc = boto3.client('ssm', os.getenv('CDK_DEFAULT_REGION'))
response = ssmc.get_parameter(Name=f"/dataall/{git_branch}/cdkjson")
cdkjson = json.loads(response['Parameter']['Value']).get('context')

app = App(context=cdkjson)
logger.info("Loaded context from SSM")

except (ssmc.exceptions.ParameterNotFound, botocore.exceptions.ClientError) as err:
if isinstance(err, ssmc.exceptions.ParameterNotFound):
logger.warning("SSM parameter not found - Proceeding with cdk.json and cdk.context.json in code")
else:
logger.error(err)

app = App()
logger.info("Loaded context from cdk.json file in repository")

cdk_pipeline_region = app.node.try_get_context('tooling_region') or os.getenv('CDK_DEFAULT_REGION')

Expand Down
21 changes: 12 additions & 9 deletions deploy/stacks/backend_stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,13 +362,16 @@ def create_opensearch_serverless_stack(self):
)

def get_s3_prefix_list(self):
ec2_client = boto3.client("ec2", region_name=self.region)
response = ec2_client.describe_prefix_lists(
Filters=[
{
'Name': 'prefix-list-name',
'Values': [f'com.amazonaws.{self.region}.s3']
},
]
)
try:
ec2_client = boto3.client("ec2", region_name=self.region)
response = ec2_client.describe_prefix_lists(
Filters=[
{
'Name': 'prefix-list-name',
'Values': [f'com.amazonaws.{self.region}.s3']
},
]
)
except:
return ""
return response['PrefixLists'][0].get("PrefixListId")

0 comments on commit 9348535

Please sign in to comment.