Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_from_dbt_manifest is de-selecting valid test nodes #719

Open
adammarples opened this issue Nov 28, 2023 · 4 comments
Open

load_from_dbt_manifest is de-selecting valid test nodes #719

adammarples opened this issue Nov 28, 2023 · 4 comments
Labels
area:dependencies Related to dependencies, like Python packages, library versions, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:test Primarily related to dbt test command or functionality parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing priority:medium Medium priority issues are important issues that may have a workaround and medium impact

Comments

@adammarples
Copy link
Contributor

adammarples commented Nov 28, 2023

NodeSelector._should_include_node has the following line

if node.resource_type == DbtResourceType.TEST:
    node.tags = getattr(self.nodes.get(node.depends_on[0]), "tags", [])

which gives the test node the tags of its parent model. Obviously this means that we cannot select tests by tag as we can in dbt-core, which would be ideal, but there is another problem.

node.depends_on[0] is supposed to return the id of the parent model of the test, however, it actually just selects the first dependency. We have tests with additional dependencies (sources), ie.

models:
- name: MY_MODEL
  config:
    tags:
    - mytag
  tests:
  - comparison_to_expected_results:
      expected: source('EXPECTED', 'MY_MODEL')

This is valid dbt but now breaks in astronomer-cosmos because the test tags are set to the source tags, not the parent model tags.

Solution:

I have found that node_dict['refs'][0]['name'] is more reliable for fetching the parent model of the test than node_dict.get("depends_on", {}).get("nodes", []) which is used in the DbtNode class, although I appreciate that dbt doesn't seem to offer a reliable way to do this.

Alternatively, we could try and remove non-model ids from the list at least to try and prune it down to being more likely to be the parent model rather than a stray source, and warn on lists greater than length 1.

if node.resource_type == DbtResourceType.TEST:
    dependency_ids = [node_id for node_id in node.depends_on if node_id.startswith('model.')]
    parent_id = dependency_ids[-1]
    if len(dependency_ids) > 1:
        logger.warning(
            f'Test node {node.name} has more than one model dependency {dependency_ids}, selected tags from parent assumed to be {parent_id}.')
    parent_tags = getattr(self.nodes.get(parent_id), "tags", [])
    node.tags = list(set(node.tags + parent_tags))

Here we can additionally use and filter on the actual test tags too, improving parity with dbt-core --select.

Alternatively, I secretly suspect that depends_on['nodes'][-1] will always be the test parent model, instead of depends_on['nodes'][0], although I haven't been through the dbt-core code to prove this.

Copy link

dosubot bot commented Nov 28, 2023

🤖

Thank you for bringing this issue to our attention and proposing a solution! Your analysis of the problem and the suggested code changes seem reasonable.

Encouraging users to contribute to open-source projects like Cosmos is always appreciated. If you are interested in contributing, I would encourage you to create a pull request with your proposed changes. The maintainers will review your changes and determine if they should be merged into the project.

Keep up the great work and thank you for your support of Cosmos!

@dosubot dosubot bot added area:dependencies Related to dependencies, like Python packages, library versions, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:test Primarily related to dbt test command or functionality parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing priority:medium Medium priority issues are important issues that may have a workaround and medium impact labels Nov 28, 2023
@adammarples
Copy link
Contributor Author

#751

@tatiana
Copy link
Collaborator

tatiana commented Jan 5, 2024

Update: we've advanced in discussing this ticket.

Cosmos should improve its behaviour to consider multiple parents so the test node selection is compatible with how dbt does this:
#751 (comment)

The workaround is to use ExecutionConfig.test_indirect_selection, as suggested by @jbandoro : #751 (comment)

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 5, 2024
Copy link

dosubot bot commented Apr 5, 2024

Hi, @adammarples,

I'm helping the Cosmos team manage their backlog and am marking this issue as stale. The issue "load_from_dbt_manifest" was causing valid test nodes to be deselected due to a bug in the "NodeSelector._should_include_node" method. The problem arose from test nodes inheriting tags from their parent model, making them unable to be selected by tag. The issue has been resolved with a proposed solution involving using a more reliable method to fetch the parent model of the test and filtering out non-model dependencies. There is an open pull request for proposed changes, and the discussion is ongoing regarding improving Cosmos' behavior to consider multiple parents for test node selection.

Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 12, 2024
@tatiana tatiana reopened this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dependencies Related to dependencies, like Python packages, library versions, etc area:selector Related to selector, like DAG selector, DBT selector, etc dbt:test Primarily related to dbt test command or functionality parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing priority:medium Medium priority issues are important issues that may have a workaround and medium impact
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants