load_from_dbt_manifest is de-selecting valid test nodes #719
Labels
area:dependencies
Related to dependencies, like Python packages, library versions, etc
area:selector
Related to selector, like DAG selector, DBT selector, etc
dbt:test
Primarily related to dbt test command or functionality
parsing:dbt_manifest
Issues, questions, or features related to dbt_manifest parsing
priority:medium
Medium priority issues are important issues that may have a workaround and medium impact
NodeSelector._should_include_node
has the following linewhich gives the test node the tags of its parent model. Obviously this means that we cannot select tests by tag as we can in dbt-core, which would be ideal, but there is another problem.
node.depends_on[0]
is supposed to return the id of the parent model of the test, however, it actually just selects the first dependency. We have tests with additional dependencies (sources), ie.This is valid dbt but now breaks in astronomer-cosmos because the test tags are set to the source tags, not the parent model tags.
Solution:
I have found that
node_dict['refs'][0]['name']
is more reliable for fetching the parent model of the test thannode_dict.get("depends_on", {}).get("nodes", [])
which is used in theDbtNode
class, although I appreciate that dbt doesn't seem to offer a reliable way to do this.Alternatively, we could try and remove non-model ids from the list at least to try and prune it down to being more likely to be the parent model rather than a stray source, and warn on lists greater than length 1.
Here we can additionally use and filter on the actual test tags too, improving parity with dbt-core --select.
Alternatively, I secretly suspect that
depends_on['nodes'][-1]
will always be the test parent model, instead ofdepends_on['nodes'][0]
, although I haven't been through the dbt-core code to prove this.The text was updated successfully, but these errors were encountered: