Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default source nodes rendering #1107

Merged
merged 34 commits into from
Aug 14, 2024
Merged

Conversation

arojasb3
Copy link
Contributor

@arojasb3 arojasb3 commented Jul 19, 2024

Description

Re-Opening of PR #661
This PR features a new way of rendering source nodes:

  • Check freshness for sources with freshness checks
  • Source tests
  • Empty operators for nodes without tests or freshness.

One of the main limitations I found while using the custom_callback functions on source nodes to check freshness is that nodes were being created on 100% of sources but not all of them required freshness checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness which would be True for sources with freshness checks and False for any other resource type.

If this feature is enabled with the option ALL:
All sources with the has_freshness == False will be rendered as Empty Operators, to keep the dbt's behavior of showing sources as suggested in issue #630

A new rendered template field is included too: freshness which is the sources.json generated by dbt when running dbt source freshness

This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source nodes when enabled.
image

As seen in the image, source nodes with freshness checks are rendered with a blue color, while the ones rendered as EmptyOperator show a white/light green color

Related Issue(s)

Closes: #630
Closes: #572
Closes: #875

Breaking Change?

This won't be a breaking change since the default behavior will still be ignoring this new feature. That can be changed with the new RenderConfig variable called source_rendering_behavior.

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Copy link

netlify bot commented Jul 19, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit bacd19d
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66bb771ef7bf7900085ce719

@arojasb3
Copy link
Contributor Author

arojasb3 commented Jul 19, 2024

hey @pankajastro !
The errors from the tests are regarding the source node group naming ("source" instead of "run") which can be debated. Since we are checking freshness I'm not 100% sure we should still name those tasks "run".

On the other hand, there's a hashing test error I'm not 100% sure how to deal with 🤔 .

Since this PR could affect some users with the custom rendering, I was thinking we could enable this with a flag?

Please let me know any thoughts in this PR.

@pankajastro
Copy link
Contributor

On the other hand, there's a hashing test error I'm not 100% sure how to deal with 🤔 .

The test test_save_dbt_ls_cache checks the hash of the project dir since you are adding a new file sources.yml in the project the hash has changed maybe you can update the hash value in test

def test_save_dbt_ls_cache(mock_variable_set, mock_datetime, tmp_dbt_project_dir):

@pankajastro
Copy link
Contributor

The errors from the tests are regarding the source node group naming ("source" instead of "run") which can be debated. Since we are checking freshness I'm not 100% sure we should still name those tasks "run".

If we decide to keep the source suffix then will have to update the test based on DbtResourceType

def test_create_task_metadata_model(

@tatiana /@pankajkoti any suggestions here?

cosmos/dbt/graph.py Outdated Show resolved Hide resolved
cosmos/operators/local.py Outdated Show resolved Hide resolved
@tatiana
Copy link
Collaborator

tatiana commented Jul 19, 2024

Thanks a lot for creating a new PR on this, @arojasb3 , 5he PR is looking great and I'm super excited we'll be merging this and releasing this in 1.6.

Since this PR could affect some users with the custom rendering, I was thinking we could enable this with a flag?

I agree we should have a feature flag, and perhaps we can enable it by default - and users who want to opt-out can do, what do you think? We have a few other similar feature flags in settings.py.

Since we are checking freshness I'm not 100% sure we should still name those tasks "run".

So far we were using run only for models, it feels worth respecting the naming we had for other node types - like you implemented:

            task_id = f"{node.name}_{node.resource_type.value}"

task_id = f"{node.name}_{node.resource_type.value}"

@tatiana
Copy link
Collaborator

tatiana commented Jul 19, 2024

The errors from the tests are regarding the source node group naming ("source" instead of "run") which can be debated. Since we are checking freshness I'm not 100% sure we should still name those tasks "run".

If we decide to keep the source suffix then will have to update the test based on DbtResourceType

def test_create_task_metadata_model(

@tatiana /@pankajkoti any suggestions here?

As mentioned in #1107 (comment), I believe we should be consistent with the rest of Cosmos naming, and have source nodes having task IDs using:

 task_id = f"{node.name}_{node.resource_type.value}"

cosmos/operators/local.py Outdated Show resolved Hide resolved
@pankajastro
Copy link
Contributor

@arojasb3, it looks like we're almost ready to merge. Could you please resolve the conflict and check for failing CI tests? A quick look suggests that we need to adjust a few parameters in the tests.

@pankajastro pankajastro marked this pull request as ready for review July 23, 2024 19:48
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:source Primarily related to dbt source command or functionality execution:local Related to Local execution environment execution:virtualenv Related to Virtualenv execution environment labels Jul 23, 2024
Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arojasb3 @pankajastro You can have done an outstanding work developing this feature, adjusting to all change requests, adapting the interfaces, making it backwards compatible, making compromises depending on the dbt version, documenting and testing.

This is an amazing piece of work and I can't wait to hear the community feedback once we release it in Cosmos 1.6.

As follow up tickets:

I left a minor comment that can be addressed as part of the test coverage improvements, that are not a requirement for the release of this feature.

Comment on lines +175 to +176
if use_task_group is True:
task_id = node.resource_type.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep only this and remove the lines 182-183, that contain the same?

            if use_task_group is True:
                task_id = node.resource_type.value

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 14, 2024
@tatiana tatiana merged commit a89389d into astronomer:main Aug 14, 2024
60 of 62 checks passed
tatiana pushed a commit that referenced this pull request Aug 14, 2024
Re-Opening of PR #661

This PR features a new way of rendering source nodes:
- Check freshness for sources with freshness checks
- Source tests
- Empty operators for nodes without tests or freshness.

One of the main limitations I found while using the `custom_callback`
functions on source nodes to check freshness is that nodes were being
created on 100% of sources but not all of them required freshness
checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness
which would be True for sources with freshness checks and False for any
other resource type.

If this feature is enabled with the option `ALL`:
All sources with the has_freshness == False will be rendered as Empty
Operators, to keep the dbt's behavior of showing sources as suggested in
issue #630
<!-- Add a brief but complete description of the change. -->

A new rendered template field is included too: `freshness` which is the
sources.json generated by dbt when running `dbt source freshness`

This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source
nodes when enabled.

![image](https://github.com/user-attachments/assets/e972ac58-8741-4c13-9905-e78775f9cc80)

As seen in the image, source nodes with freshness checks are rendered
with a blue color, while the ones rendered as EmptyOperator show a
white/light green color

Closes: #630
Closes: #572
Closes: #875
<!-- If this PR closes an issue, you can use a keyword to auto-close.
-->
<!-- i.e. "closes #0000" -->

This won't be a breaking change since the default behavior will still be
ignoring this new feature. That can be changed with the new RenderConfig
variable called `source_rendering_behavior`.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
@pankajastro
Copy link
Contributor

Thanks, @arojasb3, for the contribution; we appreciate it! 🚀

@pankajkoti pankajkoti mentioned this pull request Aug 16, 2024
pankajkoti added a commit that referenced this pull request Aug 20, 2024
New Features

* Add support for loading manifest from cloud stores using Airflow
Object Storage by @pankajkoti in #1109
* Cache ``package-lock.yml`` file by @pankajastro in #1086
* Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana
in #1079
* Add support to store and fetch ``dbt ls`` cache in remote stores by
@pankajkoti in #1147
* Add default source nodes rendering by @arojasb3 in #1107
* Add Teradata ``ProfileMapping`` by @sc250072 in #1077

Enhancements

* Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091
* Use ``dbt ls`` as the default parser when ``profile_config`` is
provided by @pankajastro in #1101
* Add task owner to dbt operators by @wornjs in #1082
* Extend Cosmos custom selector to support + when using paths and tags
by @mvictoria in #1150
* Simplify logging by @dwreeves in #1108

Bug fixes

* Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in
#1088
* Fix empty tag in case of custom parser by @pankajastro in #1100
* Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use
``ProjectConfig.dbt_vars`` by @tatiana in #1114
* Fix import handling by lazy loading hooks introduced in PR #1109 by
@dwreeves in #1132
* Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by
@pankajastro in #1162

Docs

* Fix typo in azure-container-instance docs by @pankajastro in #1106
* Use Airflow trademark as it has been registered by @pankajastro in
#1105

Others

* Run some example DAGs in Kubernetes execution mode in CI by
@pankajastro in #1127
* Install requirements.txt by default during dev env spin up by
@@CorsettiS in #1099
* Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111
* Disable test for Airflow-2.5 and Python-3.11 combination in CI by
@pankajastro in #1124
* Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154,  #1167

---------

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
@fabiomx
Copy link
Contributor

fabiomx commented Aug 30, 2024

Hi!

Sorry for commenting here and now that it's already merged, but it's very small detail.

When implementing this new feature (thanks a lot for this, @arojasb3!), it felt natural to try importing SourceRenderingBehavior directly from cosmos, just like LoadMode or TestBehavior. However, I noticed that SourceRenderingBehavior isn't included in the __init__ file and has to be imported from cosmos.constants.

I think it would be good if SourceRenderingBehavior could be imported in the same way as the others. What do you think about adding it?

P.S. The same might apply to the InvocationMode constant.

@pankajastro
Copy link
Contributor

Hi @fabiomx, Thanks for your feedback! Feel free to submit a PR anytime—I’d be happy to review and merge it.

pankajastro pushed a commit that referenced this pull request Aug 31, 2024
A very minor change aimed at improving the developer experience. As
mentioned
[here](#1107 (comment)),
certain constants from `cosmos.constants`, such as `ExecutionMode`,
`LoadMode`, or `TestBehavior`, are already imported in the `__init__`
file to facilitate direct imports.

However, other constants are not currently included, which leads to an
inconsistent import pattern when setting dbt configurations.

This PR adds the remaining enumerations: `InvocationMode`,
`TestIndirectSelection`, `SourceRenderingBehaviour`, `DbtResourceType`.

Closes #1183
@tatiana tatiana added this to the Cosmos 1.6.1 milestone Sep 25, 2024
@pankajastro pankajastro mentioned this pull request Nov 15, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:source Primarily related to dbt source command or functionality execution:local Related to Local execution environment execution:virtualenv Related to Virtualenv execution environment lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
6 participants