Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athena - Get temporary credentials from the conn_id #758

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

octiva
Copy link
Contributor

@octiva octiva commented Dec 11, 2023

Description

Passes the conn_id to the AwsGenericHook and uses get_credentials(), which handles the creation of a session, credentials, freezing of credentials & also masking.

See get_credentials() docs here

Related Issue(s)

#691

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@octiva octiva requested a review from a team as a code owner December 11, 2023 01:54
@octiva octiva requested a review from a team December 11, 2023 01:54
Copy link

netlify bot commented Dec 11, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 6761548
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/657a59029dc4d1000891bfde

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 11, 2023
@dosubot dosubot bot added area:dependencies Related to dependencies, like Python packages, library versions, etc area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc area:testing Related to testing, like unit tests, integration tests, etc dbt:run Primarily related to dbt run command or functionality execution:docker Related to Docker execution environment profile:athena Related to Athena ProfileConfig labels Dec 11, 2023
@pixie79
Copy link
Contributor

pixie79 commented Dec 11, 2023

Do you mean dbt-Athena or dbt-athena-community. Dbt-Athena is the original module but the community version is much improved and optimised. https://github.com/dbt-athena/dbt-athena

@octiva
Copy link
Contributor Author

octiva commented Dec 11, 2023

@pixie79 the dbt-athena references the project optional dependency on line 58

dbt-athena = [
    "dbt-athena-community",
]

Copy link
Collaborator

@jbandoro jbandoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @octiva for the contribution!

I'm not familiar with Athena access keys, so if anyone from #691 wants to chime in here it would be helpful. My main comment below is if we should keep the current profile mapping of the Airflow connection, are there cases where users would still want to use the Airflow connection params, or is it broken and this PR fixes it?

cosmos/profiles/athena/access_key.py Outdated Show resolved Hide resolved
dev/dags/example_athena_profile.py Outdated Show resolved Hide resolved
@octiva
Copy link
Contributor Author

octiva commented Dec 11, 2023

@jbandoro Thanks for the feedback. If the user has an Airflow connection with no extra.role_arn provided, their credentials (secret + key id) used in the DBT profile will be unchanged, except there will now always be a aws_session_token.

@octiva
Copy link
Contributor Author

octiva commented Dec 12, 2023

Tests failing due to #761

@benjamin-awd
Copy link
Contributor

LGTM, I tried this out today and it seems to work for my use-case -- unfortunately I don't have time to look into the memory issue and verify that this works at scale, but that's hopefully going to be resolved with the use of the AWS hook. Ran a couple of models concurrently and didn't run into any issues.

Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (9fcee8e) 93.22% compared to head (6761548) 93.22%.

Files Patch % Lines
cosmos/profiles/athena/access_key.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #758   +/-   ##
=======================================
  Coverage   93.22%   93.22%           
=======================================
  Files          55       55           
  Lines        2464     2481   +17     
=======================================
+ Hits         2297     2313   +16     
- Misses        167      168    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pyproject.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @octiva! It is an excellent improvement to leverage the hook method instead of directly retrieving the values from the environment variable.

Before, we were obfuscating the secrets and setting them as environment variables. I suggest we keep this behaviour, even if the credentials are temporary. The description of your PR states that the Airflow get_credentials method masks those secrets. From what I understood from the docs:

By use this method also secret_key and token will mask in tasks logs.
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/base_aws/index.html#airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook.get_credentials

In other words, it seems they are masked only in the logs - not in the profiles.yml file that Cosmos creates.

Assuming the value we wrote in the profiles.yml file was masked, the dbt commands using them would fail to run - since they would try to use invalid credentials.

tests/profiles/athena/test_athena_access_key.py Outdated Show resolved Hide resolved
tests/profiles/athena/test_athena_access_key.py Outdated Show resolved Hide resolved
@octiva
Copy link
Contributor Author

octiva commented Dec 13, 2023

@tatiana I am unable to write a test for the _get_temporary_credentialsmethod, as I am unable to import airflow.providers.amazon via the test dependencies as we get some dependency conflicts when using Python 3.8/3.9 + Airflow 2.3.

The amazon provider at 8.0.0 is compatible with 2.3 and python down to 3.7.

https://pypi.org/project/apache-airflow-providers-amazon/8.0.0/

I tried many ways to fix this on the weekend, but could not. I ended up resolving this by removing the dependency and mocking the class function.

This has the unfortunately side-affect of reducing our test coverage

@jbandoro
Copy link
Collaborator

Thanks @octiva addressing all the feedback! For your question below on test coverage:

I tried many ways to fix this on the weekend, but could not. I ended up resolving this by removing the dependency and mocking the class function.

This has the unfortunately side-affect of reducing our test coverage

You can get test coverage for the _get_temporary_credential method by instead patching the provider module that can't be imported, like in the examples here and here.

It might be easier to do this patching in a fixture and reuse it in all of tests that require it.

@octiva
Copy link
Contributor Author

octiva commented Dec 14, 2023

reposting from correct account @jbandoro Thanks for the hint. I had attempted this, but was doing it at the wrong level, and now ive got it working quite nicely. Let me know what you think 🚀

Copy link
Collaborator

@jbandoro jbandoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @octiva! Mocking the hook looks much better. Approving with some minor comments below to address before merging.

cosmos/profiles/athena/access_key.py Outdated Show resolved Hide resolved
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 14, 2023
@jbandoro jbandoro added this to the 1.3.0 milestone Dec 14, 2023
@tatiana tatiana merged commit d062543 into astronomer:main Dec 14, 2023
42 checks passed
@octiva octiva deleted the athena-assume-role branch December 15, 2023 01:31
@tatiana tatiana mentioned this pull request Jan 4, 2024
tatiana added a commit that referenced this pull request Jan 4, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in #733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in #728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in #540, #688 and #741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in #608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in #616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in #768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in #730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in #663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in #758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in #759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in #772

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Speed up integration tests by @jbandoro in #732
* Fix README quickstart link in by @RNHTTR in #776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in #766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in #757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731 and #779
* pre-commit updates in #775, #770, #762
ykuc pushed a commit to ykuc/astronomer-cosmos that referenced this pull request Jan 11, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
## Description

<!-- Add a brief but complete description of the change. -->

Passes the `conn_id` to the `AwsGenericHook` and uses
`get_credentials()`, which handles the creation of a session,
credentials, freezing of credentials & also masking.

[See get_credentials() docs
here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/base_aws/index.html#airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook.get_credentials)

## Related Issue(s)

Closes: astronomer#691 

Co-authored-by: Spencer horton <spencer.horton@workcoverqld.com.au>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dependencies Related to dependencies, like Python packages, library versions, etc area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc area:testing Related to testing, like unit tests, integration tests, etc dbt:run Primarily related to dbt run command or functionality execution:docker Related to Docker execution environment lgtm This PR has been approved by a maintainer profile:athena Related to Athena ProfileConfig size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants