-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-384] Add production s3 ingestion bucket #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple of comments
# (Optional) Allow accounts, groups, and users to access bucket (default is no access). | ||
GrantAccess: | ||
- 'arn:aws:iam::325565585839:root' # Required ARN for a synapse bucket | ||
- 'arn:aws:sts::526515999252:assumed-role/AWSReservedSSO_S3ExternalCollab_40c062f682e7f3f5/meghasyam@sagebase.org' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, does having this section in the external template just guarantee that the bucket is automatically public access? We had to do something like this for recover buckets to update to non-public access: https://github.com/Sage-Bionetworks/recover/pull/15/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great question @rxu17, could you help me ask IT this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this discussion here it doesn't seem like the template would block public access automatically.
We can either:
- wait until April 2023 (not sure what date) when the bucket created initially will have the public access automatically blocked
- Just create the bucket now, and then modify the public bucket policy on the AWS console (and this should be permanent)
Then later we could do one of the below:
- Request an update to the external j2 template to add the configurations needed to block public access
- Update this to use our recover s3 template yaml instead, and update our yaml to do all the things we want it to do such as grant access, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxu17 Is this true: https://sagebionetworks.slack.com/archives/CEFQD0KU1/p1679437439776059?thread_ts=1679430261.246839&cid=CEFQD0KU1. It seems we don't need to specify it.
I think the quickest path forward is to modify the bucket policy on the AWS console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience, it may only disables the ACL specific settings (I don't remember) under the public access portion. We still had to do this: https://github.com/Sage-Bionetworks/recover/pull/15/files even after our bucket had the ObjectOwnership: BucketOwnerEnforced set to block all of the public access points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the quickest way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this.
Purpose: This creates the ingestion bucket for production data. (SYNAPSE_INPUT_SHARED_BUCKET)
This is the first bucket in the ingress pipeline for RECOVER project. We expect to receive data from Care Evolution in this bucket. This would be the raw version of the data, no QC, no QA - just all the data dump from Care Evolution. Both Care Evolution and Sage have access to this bucket