Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node selection for @ on dbt test is incomplete #1827

Closed
1 task done
tayloramurphy opened this issue Oct 11, 2019 · 7 comments
Closed
1 task done

Node selection for @ on dbt test is incomplete #1827

tayloramurphy opened this issue Oct 11, 2019 · 7 comments
Labels
bug Something isn't working

Comments

@tayloramurphy
Copy link

Describe the bug

Our CI runs dbt run followed by dbt test in a completely clean snowflake warehouse. When using the @ syntax for both, run passes fine, but tests will fail because there are relationship tests on models that aren't built.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

If you view the graph on https://gitlab-data.gitlab.io/analytics/dbt/snowflake/#!/model/model.gitlab_dw?g_v=1&g_i=@zuora you'll see that the sfdc_executive_business_review model is not built, but on a dbt test this relationship test is run https://gitlab.com/gitlab-data/analytics/blob/master/transform%2Fsnowflake-dbt%2Fmodels%2Fsfdc%2Fbase%2Fschema.yml#L66

Expected behavior

I would either expect that test to be skipped or the relevant model to be built.

Screenshots and log output

These are the logs for an example failed job with @zuora

System information

Which database are you using dbt with?

  • snowflake

The output of dbt --version:

Running with dbt=0.14.2

The operating system you're using:
Mac OS X

The output of python --version:
3.7.4

Additional context

Add any other context about the problem here.

@tayloramurphy tayloramurphy added bug Something isn't working triage labels Oct 11, 2019
@beckjake
Copy link
Contributor

The repro for this case will be very complicated, but I think the issue is when tests get added in to the final graph.

  • We first collect all the matching models in the graph (and properly perform the children's ancestors calculation there).
  • Then we go find all the tests that descend from those collected values.
  • But we never go back up the graph and find all the ancestors of the tests we just collected, we assume that the original collection pass got all of them.

I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.

@drewbanin drewbanin removed the triage label Oct 11, 2019
@drewbanin
Copy link
Contributor

I think a simple reproduction case should be:

-- models/model1.sql

select 1 as id
-- models/model2.sql

select 1 as fk
models/schema.yml

models:
  - name: model1
    columns:
      - name: id
         relationships:
           to: ref('model2')
           field: fk

Running:

# Runs only model1
dbt run --models @model1

# Runs no tests
dbt test --models @model1

# Runs the test
dbt test --models model1 model2

I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.

It's compelling to think that tests should only be run if all of their ancestors are selected in the run, but that will make custom data tests pretty wonky! This is probably the right answer for @tayloramurphy's use case, but probably the wrong answer in the more general case.

I do not think that in this case, the selected nodes in dbt run --models @zuroa should be made aware of the tests that exist on these models -- that would definitely be unexpected behavior! I do want to note that that would be the natural behavior of my suggestion in #2203 :)

So, I think the only other path forward here is to make relationship tests smarter. I think it would be pretty reasonable to special-case schema tests that select from more than one model. We can add functionality to support tests returning with a warning that can be escalated to an error with --warn-error, and we could leverage the relation cache to determine if the to relation exists in the database or not.

Curious what y'all think?

@beckjake
Copy link
Contributor

beckjake commented Apr 10, 2020

It's compelling to think that tests should only be run if all of their ancestors are selected in the run, but that will make custom data tests pretty wonky!

What will be wonky about it? Custom data tests should use ref and source just like everything else, right?

@drewbanin
Copy link
Contributor

True, but something about this behavior for custom data tests feels wrong to me. Maybe custom data tests are less of an issue because you can conceivably select them by name/path? Let me think on that some more... it's not quite as jarring as I initially felt it was, but I don't feel great about it still.

@jtcohen6
Copy link
Contributor

I'm here from the future to say that the @ operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:

# Runs both model1 and model2
dbt run --models @model1

# Runs the test
dbt test --models @model1

I came here because I've been thinking a lot about:

if we should be excluding the tests whose ancestors aren't in the graph

Particularly with reference to state:modified + testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for both run + test. "All parents must be present" is one of a few potential approaches I'm mulling over.

In any case, I'm going to close this issue :)

@ags2121
Copy link

ags2121 commented Nov 16, 2020

I'm here from the future to say that the @ operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:

# Runs both model1 and model2
dbt run --models @model1

# Runs the test
dbt test --models @model1

I came here because I've been thinking a lot about:

if we should be excluding the tests whose ancestors aren't in the graph

Particularly with reference to state:modified + testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for both run + test. "All parents must be present" is one of a few potential approaches I'm mulling over.

In any case, I'm going to close this issue :)

@jtcohen6 do you know if this behavior is fixed in 0.18.1 for the + selector as well?

@jtcohen6
Copy link
Contributor

@ags2121 That behavior is not fixed. I saw your comment over on #2132, I agree that we should open a new issue to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants