-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default source nodes rendering #1107
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
hey @pankajastro ! On the other hand, there's a hashing test error I'm not 100% sure how to deal with 🤔 . Since this PR could affect some users with the custom rendering, I was thinking we could enable this with a flag? Please let me know any thoughts in this PR. |
The test astronomer-cosmos/tests/dbt/test_graph.py Line 1404 in e61f3a3
|
If we decide to keep the source suffix then will have to update the test based on DbtResourceType astronomer-cosmos/tests/airflow/test_graph.py Line 345 in e61f3a3
@tatiana /@pankajkoti any suggestions here? |
Thanks a lot for creating a new PR on this, @arojasb3 , 5he PR is looking great and I'm super excited we'll be merging this and releasing this in 1.6.
I agree we should have a feature flag, and perhaps we can enable it by default - and users who want to opt-out can do, what do you think? We have a few other similar feature flags in
So far we were using run only for models, it feels worth respecting the naming we had for other node types - like you implemented:
astronomer-cosmos/cosmos/airflow/graph.py Line 155 in e61f3a3
|
As mentioned in #1107 (comment), I believe we should be consistent with the rest of Cosmos naming, and have source nodes having task IDs using:
|
@arojasb3, it looks like we're almost ready to merge. Could you please resolve the conflict and check for failing CI tests? A quick look suggests that we need to adjust a few parameters in the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arojasb3 @pankajastro You can have done an outstanding work developing this feature, adjusting to all change requests, adapting the interfaces, making it backwards compatible, making compromises depending on the dbt version, documenting and testing.
This is an amazing piece of work and I can't wait to hear the community feedback once we release it in Cosmos 1.6.
As follow up tickets:
- We'll be improving the test coverage: [Test] Add integration test for source node rendering #1155
- In Cosmos 2.x, we'll change the default
SourceRenderingBehavior
so more people can benefit from this
I left a minor comment that can be addressed as part of the test coverage improvements, that are not a requirement for the release of this feature.
if use_task_group is True: | ||
task_id = node.resource_type.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep only this and remove the lines 182-183, that contain the same?
if use_task_group is True:
task_id = node.resource_type.value
Re-Opening of PR #661 This PR features a new way of rendering source nodes: - Check freshness for sources with freshness checks - Source tests - Empty operators for nodes without tests or freshness. One of the main limitations I found while using the `custom_callback` functions on source nodes to check freshness is that nodes were being created on 100% of sources but not all of them required freshness checks, this made workers waste compute time. I'm adding a new variable into the DbtNode class called has_freshness which would be True for sources with freshness checks and False for any other resource type. If this feature is enabled with the option `ALL`: All sources with the has_freshness == False will be rendered as Empty Operators, to keep the dbt's behavior of showing sources as suggested in issue #630 <!-- Add a brief but complete description of the change. --> A new rendered template field is included too: `freshness` which is the sources.json generated by dbt when running `dbt source freshness` This adds a new node type (source), which changes some tests behavior. This PR also updates the dev dbt project jaffle_shop to include source nodes when enabled. ![image](https://github.com/user-attachments/assets/e972ac58-8741-4c13-9905-e78775f9cc80) As seen in the image, source nodes with freshness checks are rendered with a blue color, while the ones rendered as EmptyOperator show a white/light green color Closes: #630 Closes: #572 Closes: #875 <!-- If this PR closes an issue, you can use a keyword to auto-close. --> <!-- i.e. "closes #0000" --> This won't be a breaking change since the default behavior will still be ignoring this new feature. That can be changed with the new RenderConfig variable called `source_rendering_behavior`. Co-authored-by: Pankaj <pankaj.singh@astronomer.io> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Thanks, @arojasb3, for the contribution; we appreciate it! 🚀 |
New Features * Add support for loading manifest from cloud stores using Airflow Object Storage by @pankajkoti in #1109 * Cache ``package-lock.yml`` file by @pankajastro in #1086 * Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana in #1079 * Add support to store and fetch ``dbt ls`` cache in remote stores by @pankajkoti in #1147 * Add default source nodes rendering by @arojasb3 in #1107 * Add Teradata ``ProfileMapping`` by @sc250072 in #1077 Enhancements * Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091 * Use ``dbt ls`` as the default parser when ``profile_config`` is provided by @pankajastro in #1101 * Add task owner to dbt operators by @wornjs in #1082 * Extend Cosmos custom selector to support + when using paths and tags by @mvictoria in #1150 * Simplify logging by @dwreeves in #1108 Bug fixes * Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in #1088 * Fix empty tag in case of custom parser by @pankajastro in #1100 * Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use ``ProjectConfig.dbt_vars`` by @tatiana in #1114 * Fix import handling by lazy loading hooks introduced in PR #1109 by @dwreeves in #1132 * Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by @pankajastro in #1162 Docs * Fix typo in azure-container-instance docs by @pankajastro in #1106 * Use Airflow trademark as it has been registered by @pankajastro in #1105 Others * Run some example DAGs in Kubernetes execution mode in CI by @pankajastro in #1127 * Install requirements.txt by default during dev env spin up by @@CorsettiS in #1099 * Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111 * Disable test for Airflow-2.5 and Python-3.11 combination in CI by @pankajastro in #1124 * Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154, #1167 --------- Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Hi! Sorry for commenting here and now that it's already merged, but it's very small detail. When implementing this new feature (thanks a lot for this, @arojasb3!), it felt natural to try importing I think it would be good if P.S. The same might apply to the |
Hi @fabiomx, Thanks for your feedback! Feel free to submit a PR anytime—I’d be happy to review and merge it. |
A very minor change aimed at improving the developer experience. As mentioned [here](#1107 (comment)), certain constants from `cosmos.constants`, such as `ExecutionMode`, `LoadMode`, or `TestBehavior`, are already imported in the `__init__` file to facilitate direct imports. However, other constants are not currently included, which leads to an inconsistent import pattern when setting dbt configurations. This PR adds the remaining enumerations: `InvocationMode`, `TestIndirectSelection`, `SourceRenderingBehaviour`, `DbtResourceType`. Closes #1183
Description
Re-Opening of PR #661
This PR features a new way of rendering source nodes:
One of the main limitations I found while using the
custom_callback
functions on source nodes to check freshness is that nodes were being created on 100% of sources but not all of them required freshness checks, this made workers waste compute time.I'm adding a new variable into the DbtNode class called has_freshness which would be True for sources with freshness checks and False for any other resource type.
If this feature is enabled with the option
ALL
:All sources with the has_freshness == False will be rendered as Empty Operators, to keep the dbt's behavior of showing sources as suggested in issue #630
A new rendered template field is included too:
freshness
which is the sources.json generated by dbt when runningdbt source freshness
This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source nodes when enabled.
As seen in the image, source nodes with freshness checks are rendered with a blue color, while the ones rendered as EmptyOperator show a white/light green color
Related Issue(s)
Closes: #630
Closes: #572
Closes: #875
Breaking Change?
This won't be a breaking change since the default behavior will still be ignoring this new feature. That can be changed with the new RenderConfig variable called
source_rendering_behavior
.Checklist