-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3 connector (for data detection & discovery) POC - fides #4930
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Passing run #8335 ↗︎
Details:
Review all test suite changes for PR #4930 ↗︎ |
4c2154f
to
4db3d15
Compare
… some shared aws utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some self-review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed name of storage_authenticator.py
to make this more generic. also updated some functionality too (see below)
src/fides/api/util/aws_util.py
Outdated
raise ValueError(f"Auth method not supported for S3: {auth_method}") | ||
|
||
if assume_role_arn: | ||
if sts_client is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wasn't sure if it was safe to call the
sts_client = session.client("sts")
sts_client.get_caller_identity()
block for all use cases with the "automatic" auth method, since we hadn't been doing that before. so rather than risk a regression, i'm only creating the sts client when secret keys are provided (as it had been done before), OR unless it's explicitly needed to assume role...the code is a bit harder to follow/less DRY, but i'd rather be safe for now...
from fides.api.schemas.storage.storage import AWSAuthMethod, StorageSecrets | ||
|
||
|
||
def get_aws_session( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more appropriately/generically named function now. the function here can and should be used by some of our other AWS connectors/codepaths, since this seems like a pretty solid authentication paradigm to promote across AWS resource types.
@@ -25,6 +25,8 @@ class DynamoDBSchema(ConnectionConfigSecretsSchema): | |||
sensitive=True, | |||
) | |||
|
|||
# TODO: include an aws_assume_role_arn and more closely follow the pattern in `connection_secrets_s3` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't want to make our changes too wide-ranging quite yet, so i've just put in a note that we should look to switch dynamoDB over to this authentication paradigm soon. dynamoDB does require a region (I think?), so i'm not sure how that changes things. but I still think that provided an option to assume a role will be the right thing to support here moving forward 👍
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4930 +/- ##
==========================================
- Coverage 86.27% 86.19% -0.08%
==========================================
Files 349 351 +2
Lines 21628 21693 +65
Branches 2869 2872 +3
==========================================
+ Hits 18659 18699 +40
- Misses 2470 2495 +25
Partials 499 499 ☔ View full report in Codecov by Sentry. |
aws_access_key_id: Optional[str] = Field( | ||
title="Access Key ID", | ||
description="Part of the credentials that provide access to your AWS account. This is required if using secret key authentication.", | ||
) | ||
aws_secret_access_key: Optional[str] = Field( | ||
title="Secret Access Key", | ||
description="Part of the credentials that provide access to your AWS account. This is required if using secret key authentication.", | ||
sensitive=True, | ||
) | ||
|
||
aws_assume_role_arn: Optional[str] = Field( | ||
title="Assume Role ARN", | ||
description="If provided, the ARN of the role that should be assumed to connect to s3.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema looks good! From my perspective:
- Access Key ID: not particularly sensitive, a bit like a username
- Secret Access Key: sensitive, treat it like a password
- Assume Role ARN: not particularly sensitive, again a bit like a username
|
||
from fides.api.common_exceptions import StorageUploadError | ||
from fides.api.schemas.storage.storage import AWSAuthMethod, StorageSecrets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment on the utility here rather than any specific line of code: great idea to make a generic utility than can be reused, AWS auth won't change.
As I understand it, the utility gives three authn options:
- Use an IAM user's long-term access key to authenticate for the entire session.
- Rely on boto3's automatic method of checking the local environment for different ways AWS credentials can be provided by the env, and authenticate using them for the session. There are a few of these, ranging from env vars to using the EC2 instance metadata service, all documented by AWS here.
- Use an IAM user's long-term access key to get temporary security credentials (a.k.a. assume an IAM role) and use these temporary creds for the session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just about spot on! the only slight difference is that i've also allowed the AWS credentials derived by boto3 (i.e. option #2) to be used in step #3, i.e. to assume an IAM role (if a user has specified automatic
authentication method while also providing an assume role ARN).
is that even a valid use case? i was a bit unsure whether this is something that can (or should) be supported. if not, i can easily clean up the code and make this a bit more straightforward.
@pattisdr i wanted to get this OSS-side PR on your queue for review, no big rush though. i'm still working through the final touches on the plus-side to get it ready for review, but i think this can be reviewed in isolation and don't want to leave it totally held up by the plus one! |
Starting review - apologies for the merge conflicts 😓 |
src/fides/api/alembic/migrations/versions/cb344673f633_add_s3_connection_type.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me @adamsachs nice round of changes
test is failing in |
Passing run #8343 ↗︎
Details:
Review all test suite changes for PR #4930 ↗︎ |
Closes PROD-2151
Needed for https://github.com/ethyca/fidesplus/pull/1456. See loom there for a demo of the connector functionality that's been added here..
Description Of Changes
OSS-side changes to support s3 as (initially) a d&d data source. Eventually we'll want to support S3 as a DSR data source, but that's not supported for now.
This PR creates an "S3" connection type that provides a space for AWS authentication credentials to be stored. We implement a new (and improved!) AWS auth paradigm that can leverage an optional "assume role ARN" to assume a specific role for S3 communications. This paradigm is not yet leveraged for our other AWS connections (e.g. DynamoDB), but the underlying utility can easily be leveraged, and we should look to support this paradigm for those connection types soon.
The S3 connection type implements a basic connection test to list buckets (as that's a key operation for our d&d monitor), but no support for the methods that need to be implemented for DSR processing.
Code Changes
S3Connector
,ConnectionType.s3
enum, andS3Schema
for secrets/credentials. add logo for the FEaws_util..py
that can be used generically to get an AWS session with provided credentials that optionally assumes a provided role ARNSteps to Confirm
Pre-Merge Checklist
CHANGELOG.md
main
downgrade()
migration is correct and works