-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support automatic/short-lived token generation for AWS Athena #691
Comments
🤖 Hello @benjamin-awd, I'm Dosu, a bot here to assist you with your issues and queries related to the Cosmos repository. While we're waiting for a human maintainer, I'm here to help you navigate and understand the project. Thank you for your detailed issue, I'm currently looking into it and will provide a full answer shortly. |
🤖 Thank you for your detailed analysis and proposed solutions for supporting automatic/short-lived token generation for AWS Athena in Cosmos. It's great to see your enthusiasm and willingness to contribute! Based on your options, it seems like Option 2, where Cosmos handles the generation of short-term credentials under the hood, would be a valuable addition. This approach aligns with the existing AWS Athena Operator and AWS Base Hook in Apache Airflow, making it more consistent and intuitive for users. I encourage you to proceed with your plan to create a pull request (PR) for Option 2. By implementing this feature, you will not only enhance the functionality of Cosmos but also contribute to the broader community of users who rely on AWS Athena with Apache Airflow. Once you have created the PR, the maintainers will review it and provide feedback. Thank you for your contribution and for making Cosmos even better! This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Hey @benjamin-awd @tatiana, I worked on this yesterday and implemented a solution where we get the connection details from the I have tests written and an example DAG in the I think it would be better if Cosmos was better aligned to Airflow providers that utilise Airflow Connections. I have a branch ready, and would like to open a PR for your review if possible. |
Sure, feel free to raise the PR and I'll take a look :) |
Looking forward to seeing your PR, @octiva ! |
Hey @tatiana & @benjamin-awd I cant seem to push via https or ssh... Do I need to be given some permissions? |
Hey @octiva, you'll likely need to fork the repository into your personal account and push changes there. Then it's super easy to open the PR from your fork back into the main repository here! |
Hey @benjamin-awd, would you mind providing me some feedback on the PR: #758 Would be much appreciated! |
## Description <!-- Add a brief but complete description of the change. --> Passes the `conn_id` to the `AwsGenericHook` and uses `get_credentials()`, which handles the creation of a session, credentials, freezing of credentials & also masking. [See get_credentials() docs here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/base_aws/index.html#airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook.get_credentials) ## Related Issue(s) Closes: astronomer#691 Co-authored-by: Spencer horton <spencer.horton@workcoverqld.com.au>
Spent some time thinking about the best way to pass short-lived AWS credentials to Cosmos -- including the
aws_session_token
into the profile will have limited utility for anyone using an Airflow connection, since the token will expire each hour (unless the connection is being dynamically refreshed or something, which sounds pretty unwieldy).These are the options that I've tried/considered so far:
Option 1: Create a custom profile via the MWAA start-up script, and pass the profile name to Cosmos
e.g.
The execution role is then able to assume the role, where running something like
returns
It's then possible to pass this to Cosmos, which then passes it to dbt-athena via the
aws_profile_name
argument. However, I couldn't get this approach to work, since for some reason boto3 can't extract the credentials from the ECS/EC2 metadata endpoint.IMO this approach isn't that good, since it involves creating a config file instead of using a role right away.
Option 2: Make a boto3 STS assume call and inject the credentials during Cosmos profile generation
This is the (slightly bootleg) approach we ended up taking, since it allows a fresh set of credentials to be generated for each run.
I think as a potential feature, it would be nice if Cosmos could take in an Airflow connection with a role_arn, and handle the generation of short term credentials under the hood (similar to the AWS Athena Operator + AWS Base Hook, where you pass in an Airflow connection with a
role_arn
argument)(EDIT: currently running into memory issues with this approach, so might not be feasible
My gut feeling is that the memory issues are caused by the repeated creation of the boto3 client across multiple threads, since it seems that boto3 is not very memory efficient (boto/boto3#1670, boto/boto3#3541). I think using the AwsBaseHook class instead of boto3 might resolve this)
Option 3: Mutate DbtLocalOperator class so that AWS credentials are generated before every run
e.g.
export $(aws configure export-credentials --profile cosmos --format=env) && dbt run ...
I think this option is quite complex and doesn't gel that well with the rest of Cosmos, since it creates floating env vars that aren't being properly managed
(IIRC there's a issue/PR related to this that talks about adding custom commands to
dbt run
)The text was updated successfully, but these errors were encountered: