Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/defer tests #2706

Closed
wants to merge 1 commit into from
Closed

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Aug 14, 2020

resolves #2701

Description

Support deferred tests.

I also prevented --defer from doing anything on snapshots/seeds more explicitly. I think the CLI argument will default to not being set regardless of the environment variable, but it's even easier to be sure if we just disable that code path entirely for snapshots and seeds.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Aug 14, 2020
@beckjake beckjake changed the base branch from dev/marian-anderson to feature/state-modified-selector August 14, 2020 19:15
Base automatically changed from feature/state-modified-selector to dev/marian-anderson August 14, 2020 19:46
@jtcohen6
Copy link
Contributor

This works! However, I'm noticing that the model references being tested are always deferred.

We need the ability to defer only some of the model references. For instance:

dbt test -m unique_fct_sprints_name --defer --state .state

I would expect this to execute my test against the production (deferred) version of fct_sprints. Whereas

dbt test -m fct_sprints --defer --state .state

Since fct_sprints is included in my selection criteria, I would expect the fct_sprints reference to not be deferred, and the tests to execute against my development version of fct_sprints.

I realize #2701 wasn't clear about this and I updated/clarified there. I also get that dbt test --models selection is weird; are the model nodes themselves actually included in the final set of selected nodes?

In any case, we just need some way for the following:

dbt test -m state:modified+ --defer --state path/to/artifacts
  1. For any models that are modified, run all tests (whether new/modified or not) against those models' current (un-deferred) namespace
  2. For all tests that are new/modified, if they select from a non-modified model, defer the reference to that model

Does that make sense? Is it possible?

@beckjake
Copy link
Contributor Author

beckjake commented Aug 17, 2020

I also get that dbt test --models selection is weird; are the model nodes themselves actually included in the final set of selected nodes?

Yeah. This is just how test selection works: Your selector indicates the nodes whose tests should be run, not the nodes that should be actually selected.

Previously I suggested making dbt test -m foo do nothing, and requiring users to run dbt test -m foo+ to get their tests of the foo model. That was shot down because some folks felt that requiring the + to get your tests running would be too confusing.

The actual process for the test selector goes like:

  • for each node selector, dbt evaluates that selector and finds all the selected nodes
    • dbt then adds all dependent test nodes, to each selector
    • appropriate set operations are applied to these expanded sets of nodes
  • nodes are then filtered based on their resource types (so non-tests are removed).

So I guess this is more of a "sort of" than a "yeah".

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 17, 2020

Ok! It sounds like this is going to be trickier than I initially appreciated.

Let's say I have a project with model_a that already exists in a production schema. If I create:

  • a modified model, model_b, that selects from model_a
  • a new test that selects from model_a (but model_a is not itself new or modified)

Then dbt ls -s state:modified returns me:

random.model_b # modified model
random.schema_test.unique_model_a_column_a # unmodified model, new test

When I:

dbt run -m state:modified+ --defer --state path/to/prod/artifacts

I will create model_b (but not model_a) in_ my development schema. My version of model_b deferred its reference of model_a to use the production namespace.

When it comes time to test, I can either defer:

dbt test -m state:modified+ --defer --state path/to/previous/run

This will defer all model references to the production run manifest. The test on model_b will reference the old version of model_b in the production namespace, so it actually will not test my modifications.

Or not defer:

dbt test -m state:modified+ path/to/previous/run

This will fail because it includes the new test model_a, but model_a does not exist in my production namespace.

N.B. Deferral works seamlessly if model_b is new, because then it doesn't exist in the deferred manifest and dbt falls back to the current namespace.

I think there are two potential approaches here:

  1. Much trickier deferral logic, based on whether the model itself (or just a test node that depends on the model) is included in the dbt test selection criteria.
  2. Selector-based, where there's some syntax that enables bifurcation:
dbt test -m [modified_models] --state path/to/previous/run
dbt test -m [modified_tests_on_unmodified_models] --defer --state path/to/previous/run

Is such a selection syntax possible today? A good answer here would get us what we need, without needing to make test deferral much more complex. (It still wouldn't solve for relationships or data tests that depend on multiple models, though, where there's a possibility of one model being modified and another being unmodified.)

Otherwise, I think we should hold off on adding support for dbt test --defer. The only CI-related scenario where it's needed is when a user adds a test, or changes the severity of an existing test, without making any changes to the model it depends on. That's more common than an edge case, but it's not so common as to be a prerequisite for the beta version of slim CI. I'd rather document it as unsupported, rather than implement it now and significantly alter the implementation later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support deferred tests
4 participants