Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXP-10361 Allow specifying the upload bucket #1051

Merged
merged 12 commits into from
Oct 5, 2022

Conversation

paulineribeyre
Copy link
Contributor

@paulineribeyre paulineribeyre commented Oct 3, 2022

Fixes and closes #1048

New Features

  • Allow specifying the bucket to upload to (endpoints /data/upload and /data/upload/<GUID>)

Deployment changes

  • Configure the ALLOWED_DATA_UPLOAD_BUCKETS setting to allow users to upload to buckets other than DATA_UPLOAD_BUCKET

Improvements

  • Return a 500 error instead of a 401 error when an S3 bucket is not configured properly

@coveralls
Copy link

coveralls commented Oct 3, 2022

Pull Request Test Coverage Report for Build 13011

  • 29 of 32 (90.63%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 73.923%

Changes Missing Coverage Covered Lines Changed/Added Lines %
fence/init.py 7 8 87.5%
fence/blueprints/data/indexd.py 7 9 77.78%
Files with Coverage Reduction New Missed Lines %
fence/blueprints/data/indexd.py 2 93.36%
Totals Coverage Status
Change from base Build 12993: 0.02%
Covered Lines: 6883
Relevant Lines: 9311

💛 - Coveralls

@PlanXCyborg PlanXCyborg added test-apis-dbgapTest test-google-googleDataAccessTest setting label to retry specific feature labels Oct 4, 2022
@paulineribeyre paulineribeyre removed test-google-googleDataAccessTest setting label to retry specific feature test-apis-dbgapTest labels Oct 4, 2022
@@ -175,11 +176,15 @@ def upload_data_file():
)

protocol = params["protocol"] if "protocol" in params else None
bucket = params.get("bucket")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think file_upload is enough of an authorization check here. We don't want to allow upload to any bucket, especially existing buckets we may not own (but might have more permissive access like write). Just because all devs have file upload doesn't mean that someone should be able to create a signed url to upload to a bucket we don't actually own and are only supposed to be reading data from. Sure, the IAM on the bucket should disallow that, but we can't count on the security outside our control.

I would suggest we consider having a separate configuration for data upload buckets ALLOWED_DATA_UPLOAD_BUCKETS or something, that we check against first.

We may want to also consider only allowing this new feature when providing authz, as that will be required for passports going forward (so we'll want to deprecate acl even more than we are today).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this should error if the provided bucket isn't in the configured list of allowed buckets. This also protects us from malicious user input too without the need to do actual sanitization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fair

Copy link
Contributor Author

@paulineribeyre paulineribeyre Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to also consider only allowing this new feature when providing authz, as that will be required for passports going forward (so we'll want to deprecate acl even more than we are today).

I'm not sure that's a good idea because the "data upload" flow does not use authz and is still supported. Plus some people in the community have written their own code to mimic that flow (such as when they're not using S3 buckets) and i think we should allow them to use this feature

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair, we'll still have the restriction of requiring authz for passports but I think that's fine and people will just have to adjust upload flows if they want to use that

if bucket:
s3_buckets = get_value(
flask.current_app.config,
"S3_BUCKETS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the above, I think we should have another layer of security by explicitly requiring the identification of buckets that are allowed for upload

@paulineribeyre paulineribeyre changed the title Allow specifying the upload bucket PXP-10361 Allow specifying the upload bucket Oct 4, 2022
@paulineribeyre paulineribeyre force-pushed the copy-feat/alternate-data_upload_bucket branch from 636f570 to cd36995 Compare October 4, 2022 16:10
@@ -901,7 +912,8 @@ def get_credential_to_access_bucket(

bucket_cred = s3_buckets.get(bucket_name)
if bucket_cred is None:
raise Unauthorized("permission denied for bucket")
logger.debug(f"Bucket '{bucket_name}' not found in S3_BUCKETS config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll never get here for data upload buckets, right? Because now we're erroring on startup. I think we should try to keep the old behavior and defer the error. Because a lot of commons don't actually use the data upload feature but might have a bucket configured (and even though Fence defaults it to '' I think cloud automation might automatically add one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll change this to a warning log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although i think it might be better to check the config at startup in general, and then assume the config is correct, but that's a discussion for later :-)

assert bucket_in_url in response.json["url"]
else:
# "permission denied for bucket"
assert response.status_code == 500, response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the bucket is not in allowed bucket configuration, I think this should really be a 403. The 500 for default bucket makes sense (b/c it should be configured to work) but for buckets that are in S3_BUCKETS but not in the allowed, that's not a server error, that's an intentional configuration to deny access, e.g. a 403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants