Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow AWS session token in profiles.yml #458

Closed
benjamin-awd opened this issue Oct 15, 2023 · 6 comments · Fixed by #459
Closed

Allow AWS session token in profiles.yml #458

benjamin-awd opened this issue Oct 15, 2023 · 6 comments · Fixed by #459
Labels
enhancement New feature or request

Comments

@benjamin-awd
Copy link
Contributor

benjamin-awd commented Oct 15, 2023

Currently, dbt-athena allows users to pass in the aws_access_key_id and aws_secret_access_key to boto3. I'd like to extend this, and be able to pass the aws_session_token argument as well.

This will allow dbt-athena queries to be run with a service role + short-lived credentials, instead of a service user + permanent credentials, which is much more preferable from a security perspective.

PR: #459

@nicor88
Copy link
Contributor

nicor88 commented Oct 15, 2023

It is not necessary to set aws_session_token explicitly. You can simply set all AWS_ credentials as env variables and that will work, but in that case you should omit also aws_acces_key_id and aws_secret_access_key.

I'm using the current adapter imementian with aws access token in multiple scenarios, AWS Sso for local development, temporary iam session from my ci, and all works as expected.

@benjamin-awd
Copy link
Contributor Author

benjamin-awd commented Oct 15, 2023

Hi @nicor88, my use-case is somewhat complex and I'm not sure that it's possible to use the aws_ credentials for it (at least AFAICT). Specifically, I'm running astronomer-cosmos with dbt-athena on AWS MWAA, which involves dynamically generating temporary profile.yml files at runtime.

I currently have a secrets manager backend set up with MWAA, from which I'm reading an Airflow AWS connection URI string like so:

aws://<access_key_id>:<aws_secret_access_key>@?region_name=us-east-1&s3_staging_dir=s3://foo&s3_data_dir=s3://bar&schema=sampledb&work_group=stg

This URI string is then parsed by the Cosmos library to dynamically generate a dbt profile.yml via this profile mapping. Before running my dbt models, I'm hoping to instead pass in a role ARN via the connection URI, then run a boto3 STS assume role call and set my access key, secret access key, and session token as environment variables so that Cosmos can construct a profiles.yml file using short-lived credentials.

(nb: the profile mapping I linked was recently added to Cosmos, and does not support the AWS session token argument, since it doesn't exist upstream here yet)

@nicor88
Copy link
Contributor

nicor88 commented Oct 15, 2023

In case you generate your profile.yml then should work - and then what you proposed is necessary.

Out of curiosity, did you try to use these ENV variables that contains your AWS session credentials?

Those ENV credentials are fetched automatically by boto3, and that means that you might not even need to setup access_key_id and aws_secret_access_key, as they are fetched automatically. But also this means that you need to have full control of the environment where dbt runs - and I'm not sure that you can in your specific scenario.

@benjamin-awd
Copy link
Contributor Author

Out of curiosity, did you try to use these ENV variables that contains your AWS session credentials?

Yeah, I actually spent quite a while trying to look for an alternative method. We originally were running dbt-athena on ECS, which worked well without an access key, since credentials were automatically generated via the task metadata endpoint, similar to what you mentioned + this section of the boto3 docs.

To use Cosmos, we switched to running dbt tasks via a virtual environment in MWAA, installed via the startup script. Unfortunately MWAA does not provide access to the metadata endpoint (even though I'm quite sure their tasks run on ECS/EC2 in the background -- probably because the infrastructure is running on a separate AWS-owned account)

@nicor88
Copy link
Contributor

nicor88 commented Oct 16, 2023

Thanks for the insights, I'm pretty sure that it might be possible to customize the startup script to use those ENV variables after retrieving them - but might be cumbersome and maybe not worth it.
I merged your PR, let me know if all works as expected, and thanks for opening your first issue and contribute to the adapter.

@nicor88 nicor88 added the enhancement New feature or request label Oct 16, 2023
@benjamin-awd
Copy link
Contributor Author

Circling back on this for posterity -- I spent a bit more time trying to generate a profile + credentials via the start-up script, but wasn't successful.

Specifically, I tried to create a custom profile with a role ARN, which the MWAA execution role can assume:

[profile cosmos]
role_arn = arn:aws:iam::xxxx:role/cosmos_test
credential_source = EcsContainer 

It turns out that it is possible to access the ECS/EC2 metadata endpoint, so running something like aws sts get-caller-identity --profile cosmos in a DAG will work, assuming that the ~.aws/config file has a profile attached to it.

Unfortunately I couldn't get further than that, so ended up using the boto3 + Python generate short term credentials, and pass them to dbt-athena (astronomer/astronomer-cosmos#609 (comment))

Thanks @nicor88 for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants