Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1914] Store links to tests on model node in manifest #6746

Open
jtcohen6 opened this issue Jan 26, 2023 · 4 comments
Open

[CT-1914] Store links to tests on model node in manifest #6746

jtcohen6 opened this issue Jan 26, 2023 · 4 comments

Comments

@jtcohen6
Copy link
Contributor

Inspired by community Slack thread

Background

Generally, each manifest node records only its parents, in the form of depends_on.nodes. While we do have a method to create a full "parent map" and "child map" in the manifest, which is included in manifest.json and leveraged in dbt-docs, this is still an expensive process.

Problems

For consumers of dbt metadata (whether via Jinja {{ graph }} shenanigans, or parsingmanifest.json after the fact), it feels harder than it should to figure out the tests defined on a model. That's in sharp contrast to the very simple UX of sticking a unique test "right on" a model.

Also: performance. The add_test_edges method is a known performance bottleneck in large projects / DAGs with many tests. Roughly half the time of that method is just getting the unique_id for each test that depends on a given node.

Running dbt build --exclude fqn:* in our sample performance project (with partial parsing enabled), _get_tests_for_node is just under 25% of the overall time:

Screenshot 2023-01-26 at 12 52 58

Proposal

What if we added a new tests property, allowing each model to know about the tests defined on it? For generic tests, specified in the same file as its parent's (yaml) configuration, we should be able to add that link during parsing, without any additional lookups.

This would feel directionally correct for more future thinking around:

  • tests that could run as "pre-flight" checks, during the model materialization
  • whether tests should actually be separate DAG nodes after all

Questions:

  • As always - how would we do this for tests with multiple parents (e.g. relationships)? Could we update the second parent's node (add the link) during ref resolution?
  • Should this be generic tests only? Would we try to include singular tests as well?
@github-actions github-actions bot changed the title Store links to tests on model node in manifest [CT-1914] Store links to tests on model node in manifest Jan 26, 2023
@adammarples
Copy link

This would be very useful (aside from the performance improvements) and the inverse would also be useful, tests can only have one parent model/source so a test.parent attribute would be very useful in downstream applications such as astronomer-cosmos etc, currently there is not a clear cut way that I am aware of for reliably finding a model node from a dependent test node in the dbt graph. We are using node.depends_on['nodes'][-1] or node['refs'][0]['name'] but I'm unsure if they are reliable.

@jtcohen6
Copy link
Contributor Author

tests can only have one parent model/source

Small clarification here - tests can have multiple parents, e.g. the relationships generic test

@adammarples
Copy link

@jtcohen6 can I clarify what I mean here, a test can only have one real parent, ie. the model which it tests. This lets us ask 'did the model pass its tests', regardless of what other models/sources are involved in that test.

It's a good point but can I check, for the following model example

version: 2

models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

If I run

dbt test --select orders

The test will run, but if I run

dbt test --select customers

it won't?

This is what I mean by a parent and it is very useful to know for writing testing logic downstream that handles testfailures or passes.

@davesgonechina
Copy link

My interest that sparked the original Slack conversation was rooted in working with codegen e.g. if you generate yaml, you can pull existing descriptions from the graph but you cannot pull tests out of the graph the same way. Having a test property would expand the potential for more automation in generating and updating documentation.

@adammarples I agree that the way a relationship test currently works there is a "primary" parent - the secondary parent is in the left join. I think we'd have to invent a bidirectional test first before we'd need to figure out how to represent it in the graph.

@jtcohen6 in this proposal, is test a new property of a node, or a new top-level tests alongside nodes and sources? I think back in January 2023 I was thinking the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants