Node selection for @ on dbt test is incomplete #1827

tayloramurphy · 2019-10-11T15:11:03Z

Describe the bug

Our CI runs dbt run followed by dbt test in a completely clean snowflake warehouse. When using the @ syntax for both, run passes fine, but tests will fail because there are relationship tests on models that aren't built.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

If you view the graph on https://gitlab-data.gitlab.io/analytics/dbt/snowflake/#!/model/model.gitlab_dw?g_v=1&g_i=@zuora you'll see that the sfdc_executive_business_review model is not built, but on a dbt test this relationship test is run https://gitlab.com/gitlab-data/analytics/blob/master/transform%2Fsnowflake-dbt%2Fmodels%2Fsfdc%2Fbase%2Fschema.yml#L66

Expected behavior

I would either expect that test to be skipped or the relevant model to be built.

Screenshots and log output

These are the logs for an example failed job with @zuora

System information

Which database are you using dbt with?

snowflake

The output of dbt --version:

Running with dbt=0.14.2

The operating system you're using:
Mac OS X

The output of python --version:
3.7.4

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

beckjake · 2019-10-11T15:50:16Z

The repro for this case will be very complicated, but I think the issue is when tests get added in to the final graph.

We first collect all the matching models in the graph (and properly perform the children's ancestors calculation there).
Then we go find all the tests that descend from those collected values.
But we never go back up the graph and find all the ancestors of the tests we just collected, we assume that the original collection pass got all of them.

I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.

drewbanin · 2020-04-10T17:16:47Z

I think a simple reproduction case should be:

-- models/model1.sql

select 1 as id

-- models/model2.sql

select 1 as fk

models/schema.yml

models:
  - name: model1
    columns:
      - name: id
         relationships:
           to: ref('model2')
           field: fk

Running:

# Runs only model1
dbt run --models @model1

# Runs no tests
dbt test --models @model1

# Runs the test
dbt test --models model1 model2

I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.

It's compelling to think that tests should only be run if all of their ancestors are selected in the run, but that will make custom data tests pretty wonky! This is probably the right answer for @tayloramurphy's use case, but probably the wrong answer in the more general case.

I do not think that in this case, the selected nodes in dbt run --models @zuroa should be made aware of the tests that exist on these models -- that would definitely be unexpected behavior! I do want to note that that would be the natural behavior of my suggestion in #2203 :)

So, I think the only other path forward here is to make relationship tests smarter. I think it would be pretty reasonable to special-case schema tests that select from more than one model. We can add functionality to support tests returning with a warning that can be escalated to an error with --warn-error, and we could leverage the relation cache to determine if the to relation exists in the database or not.

Curious what y'all think?

beckjake · 2020-04-10T17:19:45Z

It's compelling to think that tests should only be run if all of their ancestors are selected in the run, but that will make custom data tests pretty wonky!

What will be wonky about it? Custom data tests should use ref and source just like everything else, right?

drewbanin · 2020-04-10T19:12:32Z

True, but something about this behavior for custom data tests feels wrong to me. Maybe custom data tests are less of an issue because you can conceivably select them by name/path? Let me think on that some more... it's not quite as jarring as I initially felt it was, but I don't feel great about it still.

jtcohen6 · 2020-11-10T18:23:52Z

I'm here from the future to say that the @ operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:

# Runs both model1 and model2
dbt run --models @model1

# Runs the test
dbt test --models @model1

I came here because I've been thinking a lot about:

if we should be excluding the tests whose ancestors aren't in the graph

Particularly with reference to state:modified + testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for both run + test. "All parents must be present" is one of a few potential approaches I'm mulling over.

In any case, I'm going to close this issue :)

ags2121 · 2020-11-16T19:27:06Z

I'm here from the future to say that the @ operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:
# Runs both model1 and model2
dbt run --models @model1

# Runs the test
dbt test --models @model1
I came here because I've been thinking a lot about:

if we should be excluding the tests whose ancestors aren't in the graph

Particularly with reference to state:modified + testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for both run + test. "All parents must be present" is one of a few potential approaches I'm mulling over.

In any case, I'm going to close this issue :)

@jtcohen6 do you know if this behavior is fixed in 0.18.1 for the + selector as well?

jtcohen6 · 2020-11-16T20:32:24Z

@ags2121 That behavior is not fixed. I saw your comment over on #2132, I agree that we should open a new issue to discuss.

tayloramurphy added bug Something isn't working triage labels Oct 11, 2019

drewbanin removed the triage label Oct 11, 2019

drewbanin mentioned this issue Apr 16, 2020

'+' model selector with dbt test looks for incorrect models #2132

Closed

drewbanin mentioned this issue May 11, 2020

Command which 'run's, then 'test's (in the same command). #2234

Closed

jtcohen6 closed this as completed Nov 10, 2020

jtcohen6 mentioned this issue Nov 10, 2020

Update state:modified caveat re: testing dbt-labs/docs.getdbt.com#453

Merged

3 tasks

jtcohen6 mentioned this issue Nov 16, 2020

Run tests IFF all first-order parents are selected #2891

Closed

joellabes mentioned this issue Oct 17, 2021

Return to eager test selection by default, with an option to tone it down #4082

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node selection for @ on dbt test is incomplete #1827

Node selection for @ on dbt test is incomplete #1827

tayloramurphy commented Oct 11, 2019

beckjake commented Oct 11, 2019

drewbanin commented Apr 10, 2020

beckjake commented Apr 10, 2020 •

edited

Loading

drewbanin commented Apr 10, 2020

jtcohen6 commented Nov 10, 2020

ags2121 commented Nov 16, 2020

jtcohen6 commented Nov 16, 2020

Node selection for @ on dbt test is incomplete #1827

Node selection for @ on dbt test is incomplete #1827

Comments

tayloramurphy commented Oct 11, 2019

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

beckjake commented Oct 11, 2019

drewbanin commented Apr 10, 2020

beckjake commented Apr 10, 2020 • edited Loading

drewbanin commented Apr 10, 2020

jtcohen6 commented Nov 10, 2020

ags2121 commented Nov 16, 2020

jtcohen6 commented Nov 16, 2020

beckjake commented Apr 10, 2020 •

edited

Loading