Support descriptions on tests #3302

kwigley · 2021-04-27T20:45:53Z

resolves #3249

Description

This PR uses the same patching mechanism to apply updates to test nodes after parsing test properties. This currently supports descriptions for generic tests by applying the description to all nodes representing an instance of the test. When rendering the description for the test node, we expose the column name and model the test is running on in the context (column_name and model). This also supports doc blocks for both generic and bespoke test descriptions. yaml files can be defined in the test directory to achieve this.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

…tances of a generic test

kwigley · 2021-05-06T18:39:58Z

@cla-bot check

cla-bot · 2021-05-06T18:40:03Z

The cla-bot has been summoned, and re-checked this pull request!

kwigley · 2021-05-06T18:42:20Z

core/dbt/parser/manifest.py

+ model = []
+
+ if len(node.refs):
+ model = node.refs[0]
+ elif len(node.sources):
+ model = node.sources[0]
+
+ if len(model) == 1:
+ target_model_name = model[0]
+ elif len(model) == 2:
+ _, target_model_name = model
+ else:
+ raise dbt.exceptions.InternalException(
+ f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
+ )
+
+ ctx['column_name'] = node.column_name
+ ctx['model'] = target_model_name


blah hardcoded values set directly on the context, I tried to avoid this without success

This means that model and column_name are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.

Ohh I see your comment above. Okay, that makes sense!

kwigley · 2021-05-06T18:44:44Z

core/dbt/contracts/graph/manifest.py

 # between names and node ids?
+ patch_keys = set(self.patches.keys())
+ used_patch_keys = set()
+
 for node in self.nodes.values():
- patch = self.patches.pop(node.name, None)
+ patch_lookup_key = node.patch_lookup_key
+ patch = self.patches.get(patch_lookup_key, None)
 if not patch:
 continue


This is new/diverging behavior! Previously, a patch can only be applied once to a single node. The new behavior allows patches to be applied to multiple nodes to support applying the same patch to multiple instances of the same generic test.

core/dbt/contracts/graph/parsed.py

jtcohen6

This is wonderful!

Some follow-on issues that should come from here:

Per the comment here, we should including the test description (if available) in the info/stdout logging when a test fails.
Let's update the builtin generic tests to have nice descriptions! I'd be happy to take a swing at this.

All in all, the work here LGTM.

jtcohen6 · 2021-05-06T20:45:55Z

core/dbt/parser/manifest.py

+ model = []
+
+ if len(node.refs):
+ model = node.refs[0]
+ elif len(node.sources):
+ model = node.sources[0]
+
+ if len(model) == 1:
+ target_model_name = model[0]
+ elif len(model) == 2:
+ _, target_model_name = model
+ else:
+ raise dbt.exceptions.InternalException(
+ f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
+ )
+
+ ctx['column_name'] = node.column_name
+ ctx['model'] = target_model_name


This means that model and column_name are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.

jtcohen6 · 2021-05-06T20:51:09Z

core/dbt/parser/manifest.py

+ model = []
+
+ if len(node.refs):
+ model = node.refs[0]
+ elif len(node.sources):
+ model = node.sources[0]
+
+ if len(model) == 1:
+ target_model_name = model[0]
+ elif len(model) == 2:
+ _, target_model_name = model
+ else:
+ raise dbt.exceptions.InternalException(
+ f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
+ )
+
+ ctx['column_name'] = node.column_name
+ ctx['model'] = target_model_name


Ohh I see your comment above. Okay, that makes sense!

jtcohen6 · 2021-05-06T20:52:17Z

core/dbt/node_types.py

@@ -46,7 +46,8 @@ def documentable(cls) -> List['NodeType']:
 cls.Source,
 cls.Macro,
 cls.Analysis,
- cls.Exposure
+ cls.Exposure,
+ cls.Test


Confirmed that both types of tests (generic + bespoke) can use doc blocks in their descriptions, nice!

gshank · 2021-05-08T16:22:47Z

I think we should rethink this. Applying patches to common tests like 'unique' and 'not_null' would be a big performance hit in large projects. The 'process_docs' step, which is already pretty slow, would bloat quite a bit. In addition it would be very hard to handle with the new partial processing. If somebody changed one of these test patches we would have to delete every test node that depends on the that test, and every node, exposure, and source that that generated those tests, i.e. for common tests like 'unique' and 'not_null' most of the project. Probably that wouldn't happen too often, so maybe we could just refuse to partially parse any changes to test patches, but that's not optimal.

I'm wondering if this is a feature where we should do some of the deferred rendering that we've talked about occasionally. Put the unrendered description, etc, on the macro test nodes, like we do for macro_patches, and then only render those bits when a test is executed. This would remove the large parsing-time rendering overhead and would be easy to handle with partial parsing. But we'd have to write new code to render those parts at run time.

gshank · 2021-05-08T16:28:42Z

Something else to think about, if we only allow 'column_name' and 'model' in those descriptions, is not doing jinja rendering but simple a python string replace, i.e. replace {{ model}} with the model name (and the same for the column_name). It's not consistent with the way we handle other attributes, but it would be a lot faster.

jtcohen6 · 2021-05-10T13:31:47Z

@gshank I really appreciate you raising the performance concern here! It's tremendously justified. Let's figure out a way to achieve the functionality we want without bloating process_docs.

I think we can take similar "shortcuts" for builtin unique and not_null test descriptions, similar to what we did in #3034: The descriptions will be set within dbt, and they are (as you say) basically just python f-string operations. Conceivably, in the same way we "just set" configs for those tests, we could "just set" the descriptions for tests that use macro.dbt.test_unique or macro.dbt.test_not_null.

I don't see it being a common occurrence for users to edit generic test properties in development: most tests come from within dbt or within packages; a user may define a handful of custom generic tests in their own project, but those tests are highly unlikely to have a footprint on the same scale as unique or not_null.

jtcohen6 · 2021-05-10T13:40:04Z

core/dbt/parser/manifest.py

+ # Idealing we should be able to put the test kwargs in the context,
+ # but kwargs have already been processed at this point
+ # ex: 'model' --> "{{ ref('model') }}"


Just to clarify my understanding: The test_metadata that lives on the test node, and contains kwargs as well, is not Jinja-rendered. E.g. from the representation in manifest.json:

"test_metadata": { "name": "accepted_values", "kwargs": { "values": [ "usd" ], "column_name": "currency_code", "model": "{{ ref('stg_ticket_tailor__orders') }}" }, "namespace": null },

In order to include kwargs in the test description, we'd need to upgrade the description renderer to use the full Jinja rendering context (refs, macros, all that). Is that right?

kwigley · 2021-05-13T17:04:47Z

After considering the performance impacts of supporting test descriptions with this approach, and the concerns raised in the comments above, we are tabling this feature until we have a better approach for description rendering and dependency resolution for partial parsing 👍

cla-bot bot added the cla:yes label Apr 27, 2021

kwigley self-assigned this May 4, 2021

kwigley force-pushed the feature/test-yaml-properties branch from bd6993e to caaed84 Compare May 5, 2021 13:10

jtcohen6 mentioned this pull request May 5, 2021

Configurable description in yaml config for generic tests #3249

Closed

kwigley force-pushed the feature/test-yaml-properties branch from caaed84 to cd12919 Compare May 6, 2021 15:43

kwigley changed the title ~~[wip] Support descriptions on tests~~ Support descriptions on tests May 6, 2021

Kyle Wigley added 4 commits May 6, 2021 13:15

add ability to set description on tests

fa7f0ba

patches need to be applied to multiple nodes -> one patch for all ins…

22a0418

…tances of a generic test

clean up unused code

f04443a

update changelog

f227ee9

kwigley force-pushed the feature/test-yaml-properties branch from 198db83 to f227ee9 Compare May 6, 2021 17:35

kwigley commented May 6, 2021

View reviewed changes

core/dbt/contracts/graph/parsed.py Outdated Show resolved Hide resolved

Update core/dbt/contracts/graph/parsed.py

e456742

kwigley requested review from gshank, jtcohen6 and iknox-fa May 6, 2021 18:51

kwigley marked this pull request as ready for review May 6, 2021 18:52

jtcohen6 approved these changes May 6, 2021

View reviewed changes

jtcohen6 reviewed May 10, 2021

View reviewed changes

kwigley removed request for gshank and iknox-fa May 13, 2021 15:22

kwigley closed this May 13, 2021

jtcohen6 mentioned this pull request Aug 26, 2021

dbt 0.21.0-b2: test aliases don't play well with anchored yaml #3815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support descriptions on tests #3302

Support descriptions on tests #3302

kwigley commented Apr 27, 2021 •

edited

Loading

kwigley commented May 6, 2021

cla-bot bot commented May 6, 2021

kwigley May 6, 2021

jtcohen6 May 6, 2021

jtcohen6 May 6, 2021

kwigley May 6, 2021

jtcohen6 left a comment

jtcohen6 May 6, 2021

jtcohen6 May 6, 2021

jtcohen6 May 6, 2021

gshank commented May 8, 2021

gshank commented May 8, 2021

jtcohen6 commented May 10, 2021 •

edited

Loading

jtcohen6 May 10, 2021 •

edited

Loading

kwigley commented May 13, 2021

Support descriptions on tests #3302

Support descriptions on tests #3302

Conversation

kwigley commented Apr 27, 2021 • edited Loading

Description

Checklist

kwigley commented May 6, 2021

cla-bot bot commented May 6, 2021

kwigley May 6, 2021

Choose a reason for hiding this comment

jtcohen6 May 6, 2021

Choose a reason for hiding this comment

jtcohen6 May 6, 2021

Choose a reason for hiding this comment

kwigley May 6, 2021

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 May 6, 2021

Choose a reason for hiding this comment

jtcohen6 May 6, 2021

Choose a reason for hiding this comment

jtcohen6 May 6, 2021

Choose a reason for hiding this comment

gshank commented May 8, 2021

gshank commented May 8, 2021

jtcohen6 commented May 10, 2021 • edited Loading

jtcohen6 May 10, 2021 • edited Loading

Choose a reason for hiding this comment

kwigley commented May 13, 2021

kwigley commented Apr 27, 2021 •

edited

Loading

jtcohen6 commented May 10, 2021 •

edited

Loading

jtcohen6 May 10, 2021 •

edited

Loading