-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more template fields to DbtBaseOperator
#786
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #786 +/- ##
==========================================
+ Coverage 95.35% 95.39% +0.03%
==========================================
Files 57 57
Lines 2737 2759 +22
==========================================
+ Hits 2610 2632 +22
Misses 127 127 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dwreeves, thanks a lot for the contribution!
Great catch on the missing docstring and docs.
The docs are more API-focused than narrative-based or suggestive, and Cosmos's maintainers prefer this documentation style, so it's hard to find a spot for this.
We can and should certainly improve Cosmos docs. I agree they are more API-focused, but it isn't a maintainers' preference. We're open to having other styles of docs, and I'd be super happy if we could follow https://diataxis.fr/. It has not been a priority so far - but we'd welcome help.
Regarding template fields, given the existing docs, I think we could either:
(1) Create a new page inside docs/configuration
with a list of existing template fields
(2) Highlight the Cosmos arguments that are template fields where we expose them. Eg. In the case of select
, exclude
and selection
, this could be in:
https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html
In the case of env and vars, we could probably set it in dbt_vars
and env_vars
in:
https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html
We should probably add full_refresh
together with models
:
https://astronomer.github.io/astronomer-cosmos/configuration/operator-args.html
As of compiled SQL, it seems we have a dedicated section for it:
https://astronomer.github.io/astronomer-cosmos/configuration/compiled-sql.html
Which could probably be merged into operator_args
.
I'm happy for us to merge this PR once we have some docs related to the template fields.
@dwreeves oops I did not document it when adding |
I think the thing that would make the most sense for now is to just add a section under /operator-args.html that is just about the templated variables. With the exception of
So this is a bit weird because templating does not work when rendering a DAG, only when executing the task. I will keep in mind that a note should be made about this. This also does beg a question though about the recent treatment of the This was something I was thinking about irrespective of this particular PR. I think it's possible that it should be possible to set both In general the render and execution configs share a lot of configuration in the case that The reason why this discussion is potentially relevant to this PR is because of my note above about how the only operator args you can seemingly reliably utilize templating for at Cosmos's "highest level" API (i.e. DbtDag + DbtTaskGroup) is I mean, you can access
This all feels both somewhat suboptimal, and also extremely confusing, to users. It's strange that
That said, I am aware these are potentially extreme edge cases-- for example in most cases passing the string literal OK, this is all very confusing and in the weeds so I hope what I am saying makes sense. TLDR is:
|
… into add-more-template-fields
…os into add-more-template-fields
Thanks, @dwreeves , I just managed to get some time to add the missing coverage bit - it will be released as part of the alpha today! |
Yes! Thank you! Can't wait for this 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you very much, @dwreeves both for the contribution and persistence to get this into the finishing line!
Features * Add dbt docs natively in Airflow via plugin by @dwreeves in #737 * Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode by @jbandoro in #850 * Support partial parsing to render DAGs faster when using ``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and ``LoadMode.DBT_LS`` by @dwreeves in #800 * Add Azure Container Instance as Execution Mode by @danielvdende in #771 * Add dbt build operators by @dylanharper-qz in #795 * Add dbt profile config variables to mapped profile by @ykuc in #794 * Add more template fields to ``DbtBaseOperator`` by @dwreeves in #786 Bug fixes * Make ``PostgresUserPasswordProfileMapping`` schema argument optional by @FouziaTariq in #683 * Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator`` by @PrimOox in #856 * Improve ``dbt ls`` parsing resilience to missing tags/config by @tatiana in #859 * Fix ``operator_args`` modified in place in Airflow converter by @jbandoro in #835 * Fix Docker and Kubernetes operators execute method resolution by @jbandoro in #849 Docs * Fix docs homepage link by @jlaneve in #860 * Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in #847 * Fix typo in MWAA getting started guide by @jlaneve in #846 Others * Add performance integration tests by @jlaneve in #827 * Add ``connect_retries`` to databricks profile to fix expensive integration failures by @jbandoro in #826 * Add import sorting (isort) to Cosmos by @jbandoro in #866 * Add Python 3.11 to CI/tests by @tatiana and @jbandoro in #821, #824 and #825 * Fix failing ``test_created_pod`` for ``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by @jbandoro in #854 * Extend ``DatabricksTokenProfileMapping`` test to include session properties by @tatiana in #858 * Fix broken integration test uncovered from Pytest 8.0 update by @jbandoro in #845 * Pre-commit hook updates in #834, #843 and #852
Features * Add dbt docs natively in Airflow via plugin by @dwreeves in #737 * Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode by @jbandoro in #850 * Support partial parsing to render DAGs faster when using ``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and ``LoadMode.DBT_LS`` by @dwreeves in #800 * Improve performance by 22-35% or more by caching partial parse artefact by @tatiana in #904 * Add Azure Container Instance as Execution Mode by @danielvdende in #771 * Add dbt build operators by @dylanharper-qz in #795 * Add dbt profile config variables to mapped profile by @ykuc in #794 * Add more template fields to ``DbtBaseOperator`` by @dwreeves in #786 * Add ``pip_install_options`` argument to operators by @octiva in #808 Bug fixes * Make ``PostgresUserPasswordProfileMapping`` schema argument optional by @FouziaTariq in #683 * Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator`` by @PrimOox in #856 * Improve ``dbt ls`` parsing resilience to missing tags/config by @tatiana in #859 * Fix ``operator_args`` modified in place in Airflow converter by @jbandoro in #835 * Fix Docker and Kubernetes operators execute method resolution by @jbandoro in #849 * Fix ``TrinoBaseProfileMapping`` required parameter for non method authentication by @AlexandrKhabarov in #921 * Fix global flags for lists by @ms32035 in #863 * Fix ``GoogleCloudServiceAccountDictProfileMapping`` when getting values from the Airflow connection ``extra__`` keys by @glebkrapivin in #923 * Fix using the dag as a keyword argument as ``specific_args_keys`` in DbtTaskGroup by @tboutaour in #916 * Fix ACI integration (``DbtAzureContainerInstanceBaseOperator``) by @danielvdende in #872 * Fix setting dbt project dir to the tmp dir by @dwreeves in #873 * Fix dbt docs operator to not use ``graph.gpickle`` file when ``--no-write-json`` is passed by @dwreeves in #883 * Make Pydantic a required dependency by @pankajkoti in #939 * Gracefully error if users try to ``emit_datasets`` with ``Airflow 2.9.0`` or ``2.9.1`` by @tatiana in #948 * Fix parsing tests that have no parents in #933 by @jlaneve * Correct ``root_path`` in partial parse cache by @pankajkoti in #950 Docs * Fix docs homepage link by @jlaneve in #860 * Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in #847 * Fix typo in MWAA getting started guide by @jlaneve in #846 * Fix typo related to exporting docs to GCS by @tboutaour in #922 * Improve partial parsing docs by @tatiana in #898 * Improve docs for datasets for airflow >= 2.4 by @SiddiqueAhmad in #879 * Improve test behaviour docs to highlight ``warning`` feature in the ``virtualenv`` mode by @mc51 in #910 * Fix docs typo by @SiddiqueAhmad in #917 * Improve Astro docs by @RNHTTR in #951 Others * Add performance integration tests by @jlaneve in #827 * Enable ``append_env`` in ``operator_args`` by default by @tatiana in #899 * Change default ``append_env`` behaviour depending on Cosmos ``ExecutionMode`` by @pankajkoti and @pankajastro in #954 * Expose the ``dbt`` graph in the ``DbtToAirflowConverter`` class by @tommyjxl in #886 * Improve dbt docs plugin rendering padding by @dwreeves in #876 * Add ``connect_retries`` to databricks profile to fix expensive integration failures by @jbandoro in #826 * Add import sorting (isort) to Cosmos by @jbandoro in #866 * Add Python 3.11 to CI/tests by @tatiana and @jbandoro in #821, #824 and #825 * Fix failing ``test_created_pod`` for ``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by @jbandoro in #854 * Extend ``DatabricksTokenProfileMapping`` test to include session properties by @tatiana in #858 * Fix broken integration test uncovered from Pytest 8.0 update by @jbandoro in #845 * Add Apache Airflow 2.9 to the test matrix by @tatiana in #940 * Replace deprecated ``DummyOperator`` by ``EmptyOperator`` if Airflow >=2.4.0 by @tatiana in #900 * Improve logs to troubleshoot issue in 1.4.0a2 with astro-cli by @tatiana in #947 * Fix issue when publishing a new release to PyPI by @tatiana in #946 * Pre-commit hook updates in #820, #834, #843 and #852, #890, #896, #901, #905, #908, #919, #931, #941
[Daniel Reeves](https://www.linkedin.com/in/daniel-reeves-27700545/) (@dwreeves ) is an experienced Open-Source Developer currently working as a Data Architect at Battery Ventures. He has significant experience with Apache Airflow, SQL, and Python and has contributed to many [OSS projects](https://github.com/dwreeve). Not only has he been using Cosmos since its early stages, but since January 2023, he has actively contributed to the project: ![Screenshot 2024-05-14 at 10 47 30](https://github.com/astronomer/astronomer-cosmos/assets/272048/57829cb6-7eee-4b02-998b-46cc7746f15a) He has been a critical driver for the Cosmos 1.4 release, and some of his contributions include new features, bug fixes, and documentation improvements, including: * Creation of an Airflow plugin to render dbt docs: #737 * Support using dbt partial parsing file: #800 * Add more template fields to `DbtBaseOperator`: #786 * Add cancel on kill functionality: #101 * Make region optional in Snowflake profile mapping: #100 * Fix the dbt docs operator to not look for `graph.pickle`: #883 He thinks about the project long-term and proposes thorough solutions to problems faced by the community, as can be seen in Github tickets: * Introducing composability in the middle layer of Cosmos's API: #895 * Establish a general pattern for uploading artifacts to storage: #894 * Support `operator_arguments` injection at a node level: #881 One of Daniel's notable traits is his collaborative and supportive approach. He has actively engaged with users in the #airflow-dbt Slack channel, demonstrating his commitment to fostering a supportive community. We want to promote him as a Cosmos committer and maintainer for all these, recognising his constant efforts and achievements towards our community. Thank you very much, @dwreeves !
This partially addresses astronomer#754 via allowing for built-in templating support for the `DbtBaseOperator`. I also noticed `--full-refresh` was not documented so I added that in. ## Still missing Manual run pattern is not documented; the fact that these fields are templated is not documented. I don't really know where in the docs to put this. The docs are very API-focused more than narrative-based or suggestive, and Cosmos's maintainers prefer this style of documentation so it's hard to find a spot for this. It's possible that that's fine and we just keep this as a feature for more advanced users who dig into source code to discover for themselves? 🤷 Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Features * Add dbt docs natively in Airflow via plugin by @dwreeves in astronomer#737 * Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode by @jbandoro in astronomer#850 * Support partial parsing to render DAGs faster when using ``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and ``LoadMode.DBT_LS`` by @dwreeves in astronomer#800 * Add Azure Container Instance as Execution Mode by @danielvdende in astronomer#771 * Add dbt build operators by @dylanharper-qz in astronomer#795 * Add dbt profile config variables to mapped profile by @ykuc in astronomer#794 * Add more template fields to ``DbtBaseOperator`` by @dwreeves in astronomer#786 Bug fixes * Make ``PostgresUserPasswordProfileMapping`` schema argument optional by @FouziaTariq in astronomer#683 * Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator`` by @PrimOox in astronomer#856 * Improve ``dbt ls`` parsing resilience to missing tags/config by @tatiana in astronomer#859 * Fix ``operator_args`` modified in place in Airflow converter by @jbandoro in astronomer#835 * Fix Docker and Kubernetes operators execute method resolution by @jbandoro in astronomer#849 Docs * Fix docs homepage link by @jlaneve in astronomer#860 * Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in astronomer#847 * Fix typo in MWAA getting started guide by @jlaneve in astronomer#846 Others * Add performance integration tests by @jlaneve in astronomer#827 * Add ``connect_retries`` to databricks profile to fix expensive integration failures by @jbandoro in astronomer#826 * Add import sorting (isort) to Cosmos by @jbandoro in astronomer#866 * Add Python 3.11 to CI/tests by @tatiana and @jbandoro in astronomer#821, astronomer#824 and astronomer#825 * Fix failing ``test_created_pod`` for ``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by @jbandoro in astronomer#854 * Extend ``DatabricksTokenProfileMapping`` test to include session properties by @tatiana in astronomer#858 * Fix broken integration test uncovered from Pytest 8.0 update by @jbandoro in astronomer#845 * Pre-commit hook updates in astronomer#834, astronomer#843 and astronomer#852
Features * Add dbt docs natively in Airflow via plugin by @dwreeves in astronomer#737 * Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode by @jbandoro in astronomer#850 * Support partial parsing to render DAGs faster when using ``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and ``LoadMode.DBT_LS`` by @dwreeves in astronomer#800 * Improve performance by 22-35% or more by caching partial parse artefact by @tatiana in astronomer#904 * Add Azure Container Instance as Execution Mode by @danielvdende in astronomer#771 * Add dbt build operators by @dylanharper-qz in astronomer#795 * Add dbt profile config variables to mapped profile by @ykuc in astronomer#794 * Add more template fields to ``DbtBaseOperator`` by @dwreeves in astronomer#786 * Add ``pip_install_options`` argument to operators by @octiva in astronomer#808 Bug fixes * Make ``PostgresUserPasswordProfileMapping`` schema argument optional by @FouziaTariq in astronomer#683 * Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator`` by @PrimOox in astronomer#856 * Improve ``dbt ls`` parsing resilience to missing tags/config by @tatiana in astronomer#859 * Fix ``operator_args`` modified in place in Airflow converter by @jbandoro in astronomer#835 * Fix Docker and Kubernetes operators execute method resolution by @jbandoro in astronomer#849 * Fix ``TrinoBaseProfileMapping`` required parameter for non method authentication by @AlexandrKhabarov in astronomer#921 * Fix global flags for lists by @ms32035 in astronomer#863 * Fix ``GoogleCloudServiceAccountDictProfileMapping`` when getting values from the Airflow connection ``extra__`` keys by @glebkrapivin in astronomer#923 * Fix using the dag as a keyword argument as ``specific_args_keys`` in DbtTaskGroup by @tboutaour in astronomer#916 * Fix ACI integration (``DbtAzureContainerInstanceBaseOperator``) by @danielvdende in astronomer#872 * Fix setting dbt project dir to the tmp dir by @dwreeves in astronomer#873 * Fix dbt docs operator to not use ``graph.gpickle`` file when ``--no-write-json`` is passed by @dwreeves in astronomer#883 * Make Pydantic a required dependency by @pankajkoti in astronomer#939 * Gracefully error if users try to ``emit_datasets`` with ``Airflow 2.9.0`` or ``2.9.1`` by @tatiana in astronomer#948 * Fix parsing tests that have no parents in astronomer#933 by @jlaneve * Correct ``root_path`` in partial parse cache by @pankajkoti in astronomer#950 Docs * Fix docs homepage link by @jlaneve in astronomer#860 * Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in astronomer#847 * Fix typo in MWAA getting started guide by @jlaneve in astronomer#846 * Fix typo related to exporting docs to GCS by @tboutaour in astronomer#922 * Improve partial parsing docs by @tatiana in astronomer#898 * Improve docs for datasets for airflow >= 2.4 by @SiddiqueAhmad in astronomer#879 * Improve test behaviour docs to highlight ``warning`` feature in the ``virtualenv`` mode by @mc51 in astronomer#910 * Fix docs typo by @SiddiqueAhmad in astronomer#917 * Improve Astro docs by @RNHTTR in astronomer#951 Others * Add performance integration tests by @jlaneve in astronomer#827 * Enable ``append_env`` in ``operator_args`` by default by @tatiana in astronomer#899 * Change default ``append_env`` behaviour depending on Cosmos ``ExecutionMode`` by @pankajkoti and @pankajastro in astronomer#954 * Expose the ``dbt`` graph in the ``DbtToAirflowConverter`` class by @tommyjxl in astronomer#886 * Improve dbt docs plugin rendering padding by @dwreeves in astronomer#876 * Add ``connect_retries`` to databricks profile to fix expensive integration failures by @jbandoro in astronomer#826 * Add import sorting (isort) to Cosmos by @jbandoro in astronomer#866 * Add Python 3.11 to CI/tests by @tatiana and @jbandoro in astronomer#821, astronomer#824 and astronomer#825 * Fix failing ``test_created_pod`` for ``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by @jbandoro in astronomer#854 * Extend ``DatabricksTokenProfileMapping`` test to include session properties by @tatiana in astronomer#858 * Fix broken integration test uncovered from Pytest 8.0 update by @jbandoro in astronomer#845 * Add Apache Airflow 2.9 to the test matrix by @tatiana in astronomer#940 * Replace deprecated ``DummyOperator`` by ``EmptyOperator`` if Airflow >=2.4.0 by @tatiana in astronomer#900 * Improve logs to troubleshoot issue in 1.4.0a2 with astro-cli by @tatiana in astronomer#947 * Fix issue when publishing a new release to PyPI by @tatiana in astronomer#946 * Pre-commit hook updates in astronomer#820, astronomer#834, astronomer#843 and astronomer#852, astronomer#890, astronomer#896, astronomer#901, astronomer#905, astronomer#908, astronomer#919, astronomer#931, astronomer#941
[Daniel Reeves](https://www.linkedin.com/in/daniel-reeves-27700545/) (@dwreeves ) is an experienced Open-Source Developer currently working as a Data Architect at Battery Ventures. He has significant experience with Apache Airflow, SQL, and Python and has contributed to many [OSS projects](https://github.com/dwreeve). Not only has he been using Cosmos since its early stages, but since January 2023, he has actively contributed to the project: ![Screenshot 2024-05-14 at 10 47 30](https://github.com/astronomer/astronomer-cosmos/assets/272048/57829cb6-7eee-4b02-998b-46cc7746f15a) He has been a critical driver for the Cosmos 1.4 release, and some of his contributions include new features, bug fixes, and documentation improvements, including: * Creation of an Airflow plugin to render dbt docs: astronomer#737 * Support using dbt partial parsing file: astronomer#800 * Add more template fields to `DbtBaseOperator`: astronomer#786 * Add cancel on kill functionality: astronomer#101 * Make region optional in Snowflake profile mapping: astronomer#100 * Fix the dbt docs operator to not look for `graph.pickle`: astronomer#883 He thinks about the project long-term and proposes thorough solutions to problems faced by the community, as can be seen in Github tickets: * Introducing composability in the middle layer of Cosmos's API: astronomer#895 * Establish a general pattern for uploading artifacts to storage: astronomer#894 * Support `operator_arguments` injection at a node level: astronomer#881 One of Daniel's notable traits is his collaborative and supportive approach. He has actively engaged with users in the #airflow-dbt Slack channel, demonstrating his commitment to fostering a supportive community. We want to promote him as a Cosmos committer and maintainer for all these, recognising his constant efforts and achievements towards our community. Thank you very much, @dwreeves !
This partially addresses #754 via allowing for built-in templating support for the
DbtBaseOperator
.I also noticed
--full-refresh
was not documented so I added that in.Still missing
Manual run pattern is not documented; the fact that these fields are templated is not documented. I don't really know where in the docs to put this. The docs are very API-focused more than narrative-based or suggestive, and Cosmos's maintainers prefer this style of documentation so it's hard to find a spot for this. It's possible that that's fine and we just keep this as a feature for more advanced users who dig into source code to discover for themselves? 🤷