-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PXP-10361 Allow specifying the upload bucket #1051
Changes from all commits
ec08fc9
76cfc56
d86b29b
45a5f0e
59a426d
02784f7
47234b3
152a174
57beca9
cd36995
d974682
c8a459e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,6 +73,7 @@ def get_signed_url_for_file( | |
requested_protocol=None, | ||
ga4gh_passports=None, | ||
db_session=None, | ||
bucket=None, | ||
): | ||
requested_protocol = requested_protocol or flask.request.args.get("protocol", None) | ||
r_pays_project = flask.request.args.get("userProject", None) | ||
|
@@ -144,6 +145,7 @@ def get_signed_url_for_file( | |
r_pays_project=r_pays_project, | ||
file_name=file_name, | ||
users_from_passports=users_from_passports, | ||
bucket=bucket, | ||
) | ||
|
||
# a single user from the list was authorized so update the audit log to reflect that | ||
|
@@ -258,7 +260,7 @@ def index_document(self): | |
) | ||
return document | ||
|
||
def make_signed_url(self, file_name, protocol=None, expires_in=None): | ||
def make_signed_url(self, file_name, protocol=None, expires_in=None, bucket=None): | ||
""" | ||
Works for upload only; S3 or Azure Blob Storage only | ||
(only supported case for data upload flow currently). | ||
|
@@ -286,12 +288,15 @@ def make_signed_url(self, file_name, protocol=None, expires_in=None): | |
"upload", expires_in | ||
) | ||
else: | ||
try: | ||
bucket = flask.current_app.config["DATA_UPLOAD_BUCKET"] | ||
except KeyError: | ||
raise InternalError( | ||
"fence not configured with data upload bucket; can't create signed URL" | ||
) | ||
if not bucket: | ||
try: | ||
bucket = flask.current_app.config["DATA_UPLOAD_BUCKET"] | ||
except KeyError: | ||
raise InternalError( | ||
"fence not configured with data upload bucket; can't create signed URL" | ||
) | ||
|
||
self.logger.debug("Attemping to upload to bucket '{}'".format(bucket)) | ||
s3_url = "s3://{}/{}/{}".format(bucket, self.guid, file_name) | ||
url = S3IndexedFileLocation(s3_url).get_signed_url("upload", expires_in) | ||
|
||
|
@@ -450,6 +455,7 @@ def get_signed_url( | |
r_pays_project=None, | ||
file_name=None, | ||
users_from_passports=None, | ||
bucket=None, | ||
): | ||
users_from_passports = users_from_passports or {} | ||
authorized_user = None | ||
|
@@ -497,6 +503,7 @@ def get_signed_url( | |
r_pays_project, | ||
file_name, | ||
authorized_user, | ||
bucket, | ||
), | ||
authorized_user, | ||
) | ||
|
@@ -510,14 +517,18 @@ def _get_signed_url( | |
r_pays_project, | ||
file_name, | ||
authorized_user=None, | ||
bucket=None, | ||
): | ||
if action == "upload": | ||
# NOTE: self.index_document ensures the GUID exists in indexd and raises | ||
# an error if not (which is expected to be caught upstream in the | ||
# app) | ||
blank_record = BlankIndex(uploader="", guid=self.index_document.get("did")) | ||
return blank_record.make_signed_url( | ||
protocol=protocol, file_name=file_name, expires_in=expires_in | ||
protocol=protocol, | ||
file_name=file_name, | ||
expires_in=expires_in, | ||
bucket=bucket, | ||
) | ||
|
||
if not protocol: | ||
|
@@ -876,7 +887,7 @@ def bucket_name(self): | |
s3_buckets = get_value( | ||
flask.current_app.config, | ||
"S3_BUCKETS", | ||
InternalError("buckets not configured"), | ||
InternalError("S3_BUCKETS not configured"), | ||
) | ||
for bucket in s3_buckets: | ||
if re.match("^" + bucket + "$", self.parsed_url.netloc): | ||
|
@@ -892,7 +903,7 @@ def get_credential_to_access_bucket( | |
cls, bucket_name, aws_creds, expires_in, boto=None | ||
): | ||
s3_buckets = get_value( | ||
config, "S3_BUCKETS", InternalError("buckets not configured") | ||
config, "S3_BUCKETS", InternalError("S3_BUCKETS not configured") | ||
) | ||
if len(aws_creds) == 0 and len(s3_buckets) == 0: | ||
raise InternalError("no bucket is configured") | ||
|
@@ -901,7 +912,8 @@ def get_credential_to_access_bucket( | |
|
||
bucket_cred = s3_buckets.get(bucket_name) | ||
if bucket_cred is None: | ||
raise Unauthorized("permission denied for bucket") | ||
logger.debug(f"Bucket '{bucket_name}' not found in S3_BUCKETS config") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we'll never get here for data upload buckets, right? Because now we're erroring on startup. I think we should try to keep the old behavior and defer the error. Because a lot of commons don't actually use the data upload feature but might have a bucket configured (and even though Fence defaults it to '' I think cloud automation might automatically add one) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i'll change this to a warning log There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. although i think it might be better to check the config at startup in general, and then assume the config is correct, but that's a discussion for later :-) |
||
raise InternalError("permission denied for bucket") | ||
|
||
cred_key = get_value( | ||
bucket_cred, "cred", InternalError("credential of that bucket is missing") | ||
|
@@ -930,7 +942,7 @@ def get_credential_to_access_bucket( | |
|
||
def get_bucket_region(self): | ||
s3_buckets = get_value( | ||
config, "S3_BUCKETS", InternalError("buckets not configured") | ||
config, "S3_BUCKETS", InternalError("S3_BUCKETS not configured") | ||
) | ||
if len(s3_buckets) == 0: | ||
return None | ||
|
@@ -957,7 +969,7 @@ def get_signed_url( | |
config, "AWS_CREDENTIALS", InternalError("credentials not configured") | ||
) | ||
s3_buckets = get_value( | ||
config, "S3_BUCKETS", InternalError("buckets not configured") | ||
config, "S3_BUCKETS", InternalError("S3_BUCKETS not configured") | ||
) | ||
|
||
bucket_name = self.bucket_name() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think
file_upload
is enough of an authorization check here. We don't want to allow upload to any bucket, especially existing buckets we may not own (but might have more permissive access like write). Just because all devs have file upload doesn't mean that someone should be able to create a signed url to upload to a bucket we don't actually own and are only supposed to be reading data from. Sure, the IAM on the bucket should disallow that, but we can't count on the security outside our control.I would suggest we consider having a separate configuration for data upload buckets
ALLOWED_DATA_UPLOAD_BUCKETS
or something, that we check against first.We may want to also consider only allowing this new feature when providing
authz
, as that will be required for passports going forward (so we'll want to deprecate acl even more than we are today).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this should error if the provided bucket isn't in the configured list of allowed buckets. This also protects us from malicious user input too without the need to do actual sanitization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's a good idea because the "data upload" flow does not use
authz
and is still supported. Plus some people in the community have written their own code to mimic that flow (such as when they're not using S3 buckets) and i think we should allow them to use this featurewdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair, we'll still have the restriction of requiring authz for passports but I think that's fine and people will just have to adjust upload flows if they want to use that