-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additions to docs regarding the DBT_MANIFEST
load mode
#757
Additions to docs regarding the DBT_MANIFEST
load mode
#757
Conversation
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #757 +/- ##
=======================================
Coverage 93.28% 93.28%
=======================================
Files 55 55
Lines 2502 2502
=======================================
Hits 2334 2334
Misses 168 168 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the docs, @dwreeves - some minor comments inline
@tatiana Some context: I was basing this and all other documentation changes I've proposed on two things: (1) the code that currently exists in (2) The docs already mention my wording change somewhere else: I think I see what your changes are attempting to do. But before I continue with that train of thought, I am wondering: Is it possible that |
To add to that, if that does check out, then I think my confusion would be that you are viewing the AUTOMATIC and DBT_LS as separate modes, whereas I'm treating AUTOMATIC as a way to access DBT_LS. In that case where AUTOMATIC is viewed as its own thing, I still think:
WDYT? |
Hey @dwreeves, first of all, thank you very much for working on this. We've been struggling to keep docs up-to-date, and they are as important as the code - directly affecting the adoption of Cosmos. This particular discussion is also relevant for us to review and potentially improve Cosmos' behaviour.
If the manifest file is not set and the user-defined a profile configuration, yes,
This is incorrect; users can use Could you remove this note as part of this PR? During the last few weeks, we have been detaching the configuration of rendering the DAG from how Cosmos tasks execute dbt, where possible (an example issue: #568). This is particularly relevant when users use
I don't think so.
Additionally, I agree that usually users who opt to run the dbt commands using
I addressed this before; please let me know if not.
Yes,
I've thought about this, and perhaps to simplify; we should change the following: astronomer-cosmos/cosmos/dbt/graph.py Line 173 in 4369987
To be:
This would simplify the docs and the explanation of
I agree with your proposal. For GCC, as of Cosmos 1.2.x, I believe we should encourage users to set the manifest using Yes, I also agree we should highlight to GCC users that if they don't give a manifest,
Yes, I like your suggestion! Better than just removing, as a previously mentioned. |
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
…es:dwreeves/astronomer-cosmos into additional-docs-for-manifest-path
I've made changes to the PR as per the discussion. |
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the feedback, @dwreeves , we'll release this as part of 1.3!
**Features** * Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in #733 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)). * Add support to select using (some) graph operators when using ``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in #728 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude)) * Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in #755, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)). * Add ``ProfileMapping`` for Vertica by @perttus in #540, #688 and #741, as ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)). * Add ``ProfileMapping`` for Snowflake encrypted private key path by @ivanstillfront in #608, as ([documentation]( https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)). * Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in #649 * Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro in #616, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)). * Add cosmos/propagate_logs Airflow config support for disabling log propagation by @agreenburg in #648, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)). * Add operator_args ``full_refresh`` as a templated field by @joppevos in #623 * Expose environment variables and dbt variables in ``ProjectConfig`` by @jbandoro in #735 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)). * Support disabling event tracking when using Cosmos profile mapping by @jbandoro in #768, ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)). **Enhancements** * Make Pydantic an optional dependency by @pixie79 in #736 * Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in #730 * Add ``aws_session_token`` for Athena mapping by @benjamin-awd in #663 * Retrieve temporary credentials from ``conn_id`` for Athena by @octiva in #758 * Extend ``DbtDocsLocalOperator`` with static flag by @joppevos in #759 **Bug fixes** * Remove Pydantic upper version restriction so Cosmos can be used with Airflow 2.8 by @jlaneve in #772 **Others** * Replace flake8 for Ruff by @joppevos in #743 * Reduce code complexity to 8 by @joppevos in #738 * Speed up integration tests by @jbandoro in #732 * Fix README quickstart link in by @RNHTTR in #776 * Add package location to work with hatchling 1.19.0 by @jbandoro in #761 * Fix type check error in ``DbtKubernetesBaseOperator.build_env_args`` by @jbandoro in #766 * Improve ``DBT_MANIFEST`` documentation by @dwreeves in #757 * Update conflict matrix between Airflow and dbt versions by @tatiana in #731 and #779 * pre-commit updates in #775, #770, #762
) ## Description I thought that there were a few aspects regarding execution modes that could require more clarification in the docs. - The parsing methods docs mentions that only `LOCAL` execution mode is supported for `DBT_LS`, but the reverse was not true (i.e. execution mode docs made no mention of parsing methods), so I added notes about that. - GCC docs suggest using `VIRTUALENV` execution mode, but makes no mention of the fact that the `DBT_LS` parsing method is not supported in this execution mode. Naturally, in this case, users should be utilizing the `DBT_MANIFEST` load mode, but that means that the docs are incomplete since they don't include a `manifest_path=?` in the `ProjectConfig`. - Note that there are also discussions in the Airflow Slack regarding issues users have had parsing the `DbtDag` in GCC that are fixable via using a pre-compiled `manifest,json`, e.g. https://apache-airflow.slack.com/archives/C059CC42E9W/p1696435273519979 - Also see astronomer#520 for more discussion. - Generally speaking when doing the `DBT_MANIFEST` load method, the pattern is that you run `dbt deps && dbt compile` as part of your deployment, and upload your full dbt project including these artifacts. This deployment approach may be obvious to veteran users of Airflow and/or dbt, but it may not be obvious to everyone, so I think adding a couple sentences in `parsing-methods.rst` is beneficial. ## Related Issue(s) Not explicitly related, but astronomer#520 discusses some issues encountered using the default parsing method. (Specifically, running `dbt deps` from a blank slate tends to slow everything down a lot.) Part of my motivation for adding to the docs is to better advertise + better document this alternate method of parsing the dbt DAG. ## Breaking Change? n/a ## Checklist n/a
**Features** * Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)). * Add support to select using (some) graph operators when using ``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude)) * Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in astronomer#755, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)). * Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741, as ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)). * Add ``ProfileMapping`` for Snowflake encrypted private key path by @ivanstillfront in astronomer#608, as ([documentation]( https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)). * Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in astronomer#649 * Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro in astronomer#616, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)). * Add cosmos/propagate_logs Airflow config support for disabling log propagation by @agreenburg in astronomer#648, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)). * Add operator_args ``full_refresh`` as a templated field by @joppevos in astronomer#623 * Expose environment variables and dbt variables in ``ProjectConfig`` by @jbandoro in astronomer#735 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)). * Support disabling event tracking when using Cosmos profile mapping by @jbandoro in astronomer#768, ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)). **Enhancements** * Make Pydantic an optional dependency by @pixie79 in astronomer#736 * Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730 * Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663 * Retrieve temporary credentials from ``conn_id`` for Athena by @octiva in astronomer#758 * Extend ``DbtDocsLocalOperator`` with static flag by @joppevos in astronomer#759 **Bug fixes** * Remove Pydantic upper version restriction so Cosmos can be used with Airflow 2.8 by @jlaneve in astronomer#772 **Others** * Replace flake8 for Ruff by @joppevos in astronomer#743 * Reduce code complexity to 8 by @joppevos in astronomer#738 * Speed up integration tests by @jbandoro in astronomer#732 * Fix README quickstart link in by @RNHTTR in astronomer#776 * Add package location to work with hatchling 1.19.0 by @jbandoro in astronomer#761 * Fix type check error in ``DbtKubernetesBaseOperator.build_env_args`` by @jbandoro in astronomer#766 * Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757 * Update conflict matrix between Airflow and dbt versions by @tatiana in astronomer#731 and astronomer#779 * pre-commit updates in astronomer#775, astronomer#770, astronomer#762
) ## Description I thought that there were a few aspects regarding execution modes that could require more clarification in the docs. - The parsing methods docs mentions that only `LOCAL` execution mode is supported for `DBT_LS`, but the reverse was not true (i.e. execution mode docs made no mention of parsing methods), so I added notes about that. - GCC docs suggest using `VIRTUALENV` execution mode, but makes no mention of the fact that the `DBT_LS` parsing method is not supported in this execution mode. Naturally, in this case, users should be utilizing the `DBT_MANIFEST` load mode, but that means that the docs are incomplete since they don't include a `manifest_path=?` in the `ProjectConfig`. - Note that there are also discussions in the Airflow Slack regarding issues users have had parsing the `DbtDag` in GCC that are fixable via using a pre-compiled `manifest,json`, e.g. https://apache-airflow.slack.com/archives/C059CC42E9W/p1696435273519979 - Also see astronomer#520 for more discussion. - Generally speaking when doing the `DBT_MANIFEST` load method, the pattern is that you run `dbt deps && dbt compile` as part of your deployment, and upload your full dbt project including these artifacts. This deployment approach may be obvious to veteran users of Airflow and/or dbt, but it may not be obvious to everyone, so I think adding a couple sentences in `parsing-methods.rst` is beneficial. ## Related Issue(s) Not explicitly related, but astronomer#520 discusses some issues encountered using the default parsing method. (Specifically, running `dbt deps` from a blank slate tends to slow everything down a lot.) Part of my motivation for adding to the docs is to better advertise + better document this alternate method of parsing the dbt DAG. ## Breaking Change? n/a ## Checklist n/a
**Features** * Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)). * Add support to select using (some) graph operators when using ``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude)) * Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in astronomer#755, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)). * Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741, as ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)). * Add ``ProfileMapping`` for Snowflake encrypted private key path by @ivanstillfront in astronomer#608, as ([documentation]( https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)). * Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in astronomer#649 * Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro in astronomer#616, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)). * Add cosmos/propagate_logs Airflow config support for disabling log propagation by @agreenburg in astronomer#648, ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)). * Add operator_args ``full_refresh`` as a templated field by @joppevos in astronomer#623 * Expose environment variables and dbt variables in ``ProjectConfig`` by @jbandoro in astronomer#735 ([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)). * Support disabling event tracking when using Cosmos profile mapping by @jbandoro in astronomer#768, ([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)). **Enhancements** * Make Pydantic an optional dependency by @pixie79 in astronomer#736 * Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730 * Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663 * Retrieve temporary credentials from ``conn_id`` for Athena by @octiva in astronomer#758 * Extend ``DbtDocsLocalOperator`` with static flag by @joppevos in astronomer#759 **Bug fixes** * Remove Pydantic upper version restriction so Cosmos can be used with Airflow 2.8 by @jlaneve in astronomer#772 **Others** * Replace flake8 for Ruff by @joppevos in astronomer#743 * Reduce code complexity to 8 by @joppevos in astronomer#738 * Speed up integration tests by @jbandoro in astronomer#732 * Fix README quickstart link in by @RNHTTR in astronomer#776 * Add package location to work with hatchling 1.19.0 by @jbandoro in astronomer#761 * Fix type check error in ``DbtKubernetesBaseOperator.build_env_args`` by @jbandoro in astronomer#766 * Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757 * Update conflict matrix between Airflow and dbt versions by @tatiana in astronomer#731 and astronomer#779 * pre-commit updates in astronomer#775, astronomer#770, astronomer#762
Description
I thought that there were a few aspects regarding execution modes that could require more clarification in the docs.
LOCAL
execution mode is supported forDBT_LS
, but the reverse was not true (i.e. execution mode docs made no mention of parsing methods), so I added notes about that.VIRTUALENV
execution mode, but makes no mention of the fact that theDBT_LS
parsing method is not supported in this execution mode. Naturally, in this case, users should be utilizing theDBT_MANIFEST
load mode, but that means that the docs are incomplete since they don't include amanifest_path=?
in theProjectConfig
.DbtDag
in GCC that are fixable via using a pre-compiledmanifest,json
, e.g. https://apache-airflow.slack.com/archives/C059CC42E9W/p1696435273519979DBT_MANIFEST
load method, the pattern is that you rundbt deps && dbt compile
as part of your deployment, and upload your full dbt project including these artifacts. This deployment approach may be obvious to veteran users of Airflow and/or dbt, but it may not be obvious to everyone, so I think adding a couple sentences inparsing-methods.rst
is beneficial.Related Issue(s)
Not explicitly related, but #520 discusses some issues encountered using the default parsing method. (Specifically, running
dbt deps
from a blank slate tends to slow everything down a lot.)Part of my motivation for adding to the docs is to better advertise + better document this alternate method of parsing the dbt DAG.
Breaking Change?
n/a
Checklist
n/a