[CT-1914] Store links to tests on model node in manifest #6746

jtcohen6 · 2023-01-26T12:04:28Z

Background

Generally, each manifest node records only its parents, in the form of depends_on.nodes. While we do have a method to create a full "parent map" and "child map" in the manifest, which is included in manifest.json and leveraged in dbt-docs, this is still an expensive process.

Problems

For consumers of dbt metadata (whether via Jinja {{ graph }} shenanigans, or parsingmanifest.json after the fact), it feels harder than it should to figure out the tests defined on a model. That's in sharp contrast to the very simple UX of sticking a unique test "right on" a model.

Also: performance. The add_test_edges method is a known performance bottleneck in large projects / DAGs with many tests. Roughly half the time of that method is just getting the unique_id for each test that depends on a given node.

Running dbt build --exclude fqn:* in our sample performance project (with partial parsing enabled), _get_tests_for_node is just under 25% of the overall time:

Proposal

What if we added a new tests property, allowing each model to know about the tests defined on it? For generic tests, specified in the same file as its parent's (yaml) configuration, we should be able to add that link during parsing, without any additional lookups.

This would feel directionally correct for more future thinking around:

tests that could run as "pre-flight" checks, during the model materialization
whether tests should actually be separate DAG nodes after all

Questions:

As always - how would we do this for tests with multiple parents (e.g. relationships)? Could we update the second parent's node (add the link) during ref resolution?
Should this be generic tests only? Would we try to include singular tests as well?

The text was updated successfully, but these errors were encountered:

adammarples · 2023-11-28T13:35:56Z

This would be very useful (aside from the performance improvements) and the inverse would also be useful, tests can only have one parent model/source so a test.parent attribute would be very useful in downstream applications such as astronomer-cosmos etc, currently there is not a clear cut way that I am aware of for reliably finding a model node from a dependent test node in the dbt graph. We are using node.depends_on['nodes'][-1] or node['refs'][0]['name'] but I'm unsure if they are reliable.

jtcohen6 · 2023-11-28T13:44:10Z

tests can only have one parent model/source

Small clarification here - tests can have multiple parents, e.g. the relationships generic test

adammarples · 2023-11-28T14:35:15Z

@jtcohen6 can I clarify what I mean here, a test can only have one real parent, ie. the model which it tests. This lets us ask 'did the model pass its tests', regardless of what other models/sources are involved in that test.

It's a good point but can I check, for the following model example

version: 2

models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

If I run

dbt test --select orders

The test will run, but if I run

dbt test --select customers

it won't?

This is what I mean by a parent and it is very useful to know for writing testing logic downstream that handles testfailures or passes.

davesgonechina · 2023-11-28T18:29:51Z

My interest that sparked the original Slack conversation was rooted in working with codegen e.g. if you generate yaml, you can pull existing descriptions from the graph but you cannot pull tests out of the graph the same way. Having a test property would expand the potential for more automation in generating and updating documentation.

@adammarples I agree that the way a relationship test currently works there is a "primary" parent - the secondary parent is in the left join. I think we'd have to invent a bidirectional test first before we'd need to figure out how to represent it in the graph.

@jtcohen6 in this proposal, is test a new property of a node, or a new top-level tests alongside nodes and sources? I think back in January 2023 I was thinking the latter.

jtcohen6 added performance Team:Execution labels Jan 26, 2023

github-actions bot changed the title ~~Store links to tests on model node in manifest~~ [CT-1914] Store links to tests on model node in manifest Jan 26, 2023

jtcohen6 removed Team:Language labels Jul 19, 2023

adammarples mentioned this issue Dec 14, 2023

Select models which have tests with additional dependencies astronomer/astronomer-cosmos#751

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1914] Store links to tests on model node in manifest #6746

[CT-1914] Store links to tests on model node in manifest #6746

jtcohen6 commented Jan 26, 2023

adammarples commented Nov 28, 2023

jtcohen6 commented Nov 28, 2023

adammarples commented Nov 28, 2023

davesgonechina commented Nov 28, 2023

[CT-1914] Store links to tests on model node in manifest #6746

[CT-1914] Store links to tests on model node in manifest #6746

Comments

jtcohen6 commented Jan 26, 2023

Background

Problems

Proposal

adammarples commented Nov 28, 2023

jtcohen6 commented Nov 28, 2023

adammarples commented Nov 28, 2023

davesgonechina commented Nov 28, 2023