Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support descriptions on tests #3302

Closed
wants to merge 5 commits into from
Closed

Conversation

kwigley
Copy link
Contributor

@kwigley kwigley commented Apr 27, 2021

resolves #3249

Description

This PR uses the same patching mechanism to apply updates to test nodes after parsing test properties. This currently supports descriptions for generic tests by applying the description to all nodes representing an instance of the test. When rendering the description for the test node, we expose the column name and model the test is running on in the context (column_name and model). This also supports doc blocks for both generic and bespoke test descriptions. yaml files can be defined in the test directory to achieve this.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Apr 27, 2021
@kwigley kwigley self-assigned this May 4, 2021
@kwigley kwigley force-pushed the feature/test-yaml-properties branch from bd6993e to caaed84 Compare May 5, 2021 13:10
@kwigley kwigley force-pushed the feature/test-yaml-properties branch from caaed84 to cd12919 Compare May 6, 2021 15:43
@kwigley kwigley changed the title [wip] Support descriptions on tests Support descriptions on tests May 6, 2021
@kwigley kwigley force-pushed the feature/test-yaml-properties branch from 198db83 to f227ee9 Compare May 6, 2021 17:35
@kwigley
Copy link
Contributor Author

kwigley commented May 6, 2021

@cla-bot check

@cla-bot
Copy link

cla-bot bot commented May 6, 2021

The cla-bot has been summoned, and re-checked this pull request!

Comment on lines +757 to +774
model = []

if len(node.refs):
model = node.refs[0]
elif len(node.sources):
model = node.sources[0]

if len(model) == 1:
target_model_name = model[0]
elif len(model) == 2:
_, target_model_name = model
else:
raise dbt.exceptions.InternalException(
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
)

ctx['column_name'] = node.column_name
ctx['model'] = target_model_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blah hardcoded values set directly on the context, I tried to avoid this without success

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that model and column_name are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see your comment above. Okay, that makes sense!

Comment on lines 731 to 739
# between names and node ids?
patch_keys = set(self.patches.keys())
used_patch_keys = set()

for node in self.nodes.values():
patch = self.patches.pop(node.name, None)
patch_lookup_key = node.patch_lookup_key
patch = self.patches.get(patch_lookup_key, None)
if not patch:
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new/diverging behavior! Previously, a patch can only be applied once to a single node. The new behavior allows patches to be applied to multiple nodes to support applying the same patch to multiple instances of the same generic test.

@kwigley kwigley marked this pull request as ready for review May 6, 2021 18:52
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful!

Some follow-on issues that should come from here:

  • Per the comment here, we should including the test description (if available) in the info/stdout logging when a test fails.
  • Let's update the builtin generic tests to have nice descriptions! I'd be happy to take a swing at this.

All in all, the work here LGTM.

Comment on lines +757 to +774
model = []

if len(node.refs):
model = node.refs[0]
elif len(node.sources):
model = node.sources[0]

if len(model) == 1:
target_model_name = model[0]
elif len(model) == 2:
_, target_model_name = model
else:
raise dbt.exceptions.InternalException(
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
)

ctx['column_name'] = node.column_name
ctx['model'] = target_model_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that model and column_name are the only additions to the test description rendering context, right? I think that's totally ok for the v1 of this! In the future, it would be neat if we could grab other args and include them in the rendering context, but that should 100% not block us from merging this as an amazing first cut.

Comment on lines +757 to +774
model = []

if len(node.refs):
model = node.refs[0]
elif len(node.sources):
model = node.sources[0]

if len(model) == 1:
target_model_name = model[0]
elif len(model) == 2:
_, target_model_name = model
else:
raise dbt.exceptions.InternalException(
f'Refs and sources should always be 1 or 2 arguments - got {len(model)}'
)

ctx['column_name'] = node.column_name
ctx['model'] = target_model_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see your comment above. Okay, that makes sense!

@@ -46,7 +46,8 @@ def documentable(cls) -> List['NodeType']:
cls.Source,
cls.Macro,
cls.Analysis,
cls.Exposure
cls.Exposure,
cls.Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that both types of tests (generic + bespoke) can use doc blocks in their descriptions, nice!

@gshank
Copy link
Contributor

gshank commented May 8, 2021

I think we should rethink this. Applying patches to common tests like 'unique' and 'not_null' would be a big performance hit in large projects. The 'process_docs' step, which is already pretty slow, would bloat quite a bit. In addition it would be very hard to handle with the new partial processing. If somebody changed one of these test patches we would have to delete every test node that depends on the that test, and every node, exposure, and source that that generated those tests, i.e. for common tests like 'unique' and 'not_null' most of the project. Probably that wouldn't happen too often, so maybe we could just refuse to partially parse any changes to test patches, but that's not optimal.

I'm wondering if this is a feature where we should do some of the deferred rendering that we've talked about occasionally. Put the unrendered description, etc, on the macro test nodes, like we do for macro_patches, and then only render those bits when a test is executed. This would remove the large parsing-time rendering overhead and would be easy to handle with partial parsing. But we'd have to write new code to render those parts at run time.

@gshank
Copy link
Contributor

gshank commented May 8, 2021

Something else to think about, if we only allow 'column_name' and 'model' in those descriptions, is not doing jinja rendering but simple a python string replace, i.e. replace {{ model}} with the model name (and the same for the column_name). It's not consistent with the way we handle other attributes, but it would be a lot faster.

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 10, 2021

@gshank I really appreciate you raising the performance concern here! It's tremendously justified. Let's figure out a way to achieve the functionality we want without bloating process_docs.

I think we can take similar "shortcuts" for builtin unique and not_null test descriptions, similar to what we did in #3034: The descriptions will be set within dbt, and they are (as you say) basically just python f-string operations. Conceivably, in the same way we "just set" configs for those tests, we could "just set" the descriptions for tests that use macro.dbt.test_unique or macro.dbt.test_not_null.

I don't see it being a common occurrence for users to edit generic test properties in development: most tests come from within dbt or within packages; a user may define a handful of custom generic tests in their own project, but those tests are highly unlikely to have a footprint on the same scale as unique or not_null.

Comment on lines +747 to +749
# Idealing we should be able to put the test kwargs in the context,
# but kwargs have already been processed at this point
# ex: 'model' --> "{{ ref('model') }}"
Copy link
Contributor

@jtcohen6 jtcohen6 May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my understanding: The test_metadata that lives on the test node, and contains kwargs as well, is not Jinja-rendered. E.g. from the representation in manifest.json:

"test_metadata": {
    "name": "accepted_values",
    "kwargs": {
        "values": [
            "usd"
        ],
        "column_name": "currency_code",
        "model": "{{ ref('stg_ticket_tailor__orders') }}"
    },
    "namespace": null
},

In order to include kwargs in the test description, we'd need to upgrade the description renderer to use the full Jinja rendering context (refs, macros, all that). Is that right?

@kwigley
Copy link
Contributor Author

kwigley commented May 13, 2021

After considering the performance impacts of supporting test descriptions with this approach, and the concerns raised in the comments above, we are tabling this feature until we have a better approach for description rendering and dependency resolution for partial parsing 👍

@kwigley kwigley closed this May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable description in yaml config for generic tests
3 participants