-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PXP-10361 Allow specifying the upload bucket #1051
PXP-10361 Allow specifying the upload bucket #1051
Conversation
Pull Request Test Coverage Report for Build 13011
💛 - Coveralls |
@@ -175,11 +176,15 @@ def upload_data_file(): | |||
) | |||
|
|||
protocol = params["protocol"] if "protocol" in params else None | |||
bucket = params.get("bucket") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think file_upload
is enough of an authorization check here. We don't want to allow upload to any bucket, especially existing buckets we may not own (but might have more permissive access like write). Just because all devs have file upload doesn't mean that someone should be able to create a signed url to upload to a bucket we don't actually own and are only supposed to be reading data from. Sure, the IAM on the bucket should disallow that, but we can't count on the security outside our control.
I would suggest we consider having a separate configuration for data upload buckets ALLOWED_DATA_UPLOAD_BUCKETS
or something, that we check against first.
We may want to also consider only allowing this new feature when providing authz
, as that will be required for passports going forward (so we'll want to deprecate acl even more than we are today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this should error if the provided bucket isn't in the configured list of allowed buckets. This also protects us from malicious user input too without the need to do actual sanitization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to also consider only allowing this new feature when providing authz, as that will be required for passports going forward (so we'll want to deprecate acl even more than we are today).
I'm not sure that's a good idea because the "data upload" flow does not use authz
and is still supported. Plus some people in the community have written their own code to mimic that flow (such as when they're not using S3 buckets) and i think we should allow them to use this feature
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair, we'll still have the restriction of requiring authz for passports but I think that's fine and people will just have to adjust upload flows if they want to use that
fence/blueprints/data/blueprint.py
Outdated
if bucket: | ||
s3_buckets = get_value( | ||
flask.current_app.config, | ||
"S3_BUCKETS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the above, I think we should have another layer of security by explicitly requiring the identification of buckets that are allowed for upload
636f570
to
cd36995
Compare
@@ -901,7 +912,8 @@ def get_credential_to_access_bucket( | |||
|
|||
bucket_cred = s3_buckets.get(bucket_name) | |||
if bucket_cred is None: | |||
raise Unauthorized("permission denied for bucket") | |||
logger.debug(f"Bucket '{bucket_name}' not found in S3_BUCKETS config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll never get here for data upload buckets, right? Because now we're erroring on startup. I think we should try to keep the old behavior and defer the error. Because a lot of commons don't actually use the data upload feature but might have a bucket configured (and even though Fence defaults it to '' I think cloud automation might automatically add one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll change this to a warning log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although i think it might be better to check the config at startup in general, and then assume the config is correct, but that's a discussion for later :-)
tests/data/test_blank_index.py
Outdated
assert bucket_in_url in response.json["url"] | ||
else: | ||
# "permission denied for bucket" | ||
assert response.status_code == 500, response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the bucket is not in allowed bucket configuration, I think this should really be a 403. The 500 for default bucket makes sense (b/c it should be configured to work) but for buckets that are in S3_BUCKETS but not in the allowed, that's not a server error, that's an intentional configuration to deny access, e.g. a 403
Fixes and closes #1048
New Features
/data/upload
and/data/upload/<GUID>
)Deployment changes
ALLOWED_DATA_UPLOAD_BUCKETS
setting to allow users to upload to buckets other thanDATA_UPLOAD_BUCKET
Improvements