Snowflake: Show terse object running for all models when only calling single model #2673

brittianwarner · 2020-07-30T22:15:27Z

Describe the bug

When only triggering one model via --model <my_model_name> it looks like this is triggering the "show terse objects" for all models where I would expect it only to run for the model specified and all dependencies.

Steps To Reproduce

Trigger DBT Run for one model in a project with multiple models/packages

Expected behavior

I would expect show terse object to only occur for the selected model and all dependencies

Screenshots and log output

dbt run --profiles-dir . --vars '{"src_schema": "64", "target_schema": "64"}' --model linkedin

System information

Which database are you using dbt with?

postgres
redshift
bigquery
[x ] snowflake
other (specify: ____________)

The output of dbt --version:

0.17.0

The operating system you're using:
linux
**The output of python --version:3.8

Additional context

All of my code is still working but it seems like a waste to run this query for all models when only a single model is specified

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-07-31T14:31:30Z

Hey @brittianwarner, this is very much our intended behavior. I'll do my best to explain below. That said, I'm happy to keep the conversation going if you have other ideas.

dbt: At the beginning of each run, dbt caches information from the database to know about all the objects that exist in any of the databases + schemas where dbt plans to create objects. This is the most efficient and straightforward way to grab the information once, at the start of the run, rather than (as very old versions of dbt used to) querying metadata tables before every single model run. There isn't really a difference when you're only running one model, but when you're running many, it's huge.

Snowflake: Earlier this year, we changed from querying the information_schema to using show terse objects because it performs significantly better (#2174). Running show queries does not require a warehouse, so it does not queue up with other queries. If we were to try to filter via show objects like, using a case-insensitive pattern match, our understanding (from talking to Snowflake) is that this is no more performant than simply showing all objects; it limits the output, but Snowflake still has to scan all metadata records while pattern-matching. We would gain nothing performance-wise at the potential cost of excluding relevant metadata.

brittianwarner · 2020-07-31T17:01:14Z

@jtcohen6 , thanks for the awesome explanation. Makes sense. The biggest reason I brought this up was because we are passing variables to tell the model which schema to look at. So for the example screenshot above you can see that there are two queries that failed because those schemas don't exist. Though, this doesn't really break anything, we are technically going to be running queries that are failing each time we execute DBT. Given this context, I will leave it up to your team to decide whether this is a big deal or not.

jtcohen6 · 2020-07-31T17:15:37Z

Got it, and appreciate the context. If you're materializing models in custom schemas, dbt should be trying to create those custom schemas if they don't already exist—is that different from what's happening on your end?

brittianwarner · 2020-07-31T19:11:35Z

The variables for the schemas are passed so that the package/model knows which source and target schema to use for a specific query. In our case (using the screenshot above), we pass a number for these variables (ex: --vars '{"src_schema": "64", "target_schema": "64"}) so when we kick off a specific model where we know a schema exists (ex: linkedin), dbt is running the 'show terse objects' for all models, however the only model where 64 exists is linkedin.

brittianwarner · 2020-08-04T03:39:42Z

@jtcohen6 Hope you had a good weekend. Just following up on this and making sure my last message made sense? Let me know if you need more info on my end.

jtcohen6 · 2020-08-04T22:23:31Z

Hey @brittianwarner, I think that makes sense. If I follow correctly, you have a single var called src_schema that is used to define several different sources; those sources may or may not exist for a given value of src_schema.

In an another recent issue, I wrote a little about the opinionated principles and expectations that underly dbt's relationship with the database. When dbt compiles a project, it expects all the resources defined in that project to have a sensible working relationship with the state of the database. It expects source schemas to already exist, and to have permissions to grab metadata about them, in the same way that it expects to have permissions to create models it knows about (and the schemas for those models). These abstractions, and their baked-in assumptions, get us quite far 98% of the time.

dbt does not try to cleverly account for source schemas that may or may not exist. Last Friday, Claire published a discourse post that address this problem—with the intended audience of package creators, who may wish to write code in expectation of these edge cases.

All of that said, if I were you, I'd think about:

Creating placeholder schemas in your database so that, for any given value (xx) of src_schema, there is always edw_eng.adwords_xx, edw_eng.hubspot_xx, edw_eng.linkedin_xx. This is the simplest, by far, but it may be controversial.
Disabling certain source schemas when you know they don't exist, based on the value of src_schema, via dynamic Jinja expression. This makes sense if much more common for a schema to be present than missing, and if you know exactly when it's the latter.

version: 2

sources:
    - name: adwords_
      schema: "{{ 'adwords_' ~ var('src_schema') }}"
      enabled: "{{ if var('src_schema' not in ('12', '25', ...) }}"   # known subset missing adwords data
      database: edw_eng
      tables:
          - name: table1
            description: "abc123"
          - name: table2
            description: "def456"

    - name: adwords_02
      schema: adwords_02
      tables: *adwordstables

      # skip adwords_03 because it doesn't exist!

    - name: adwords_04
       schema: adwords_04
       tables: *adwordstables

Using YML anchors (discourse) to define different all possible source schemas at once, with as little duplication of code as possible. This makes sense if you don't have a ton of potential values of the src_schema var.

version: 2

sources:
    - name: adwords_10
      schema: adwords_01
      database: edw_eng
      tables: &adwordstables
          - name: table1
            description: "abc123"
          - name: table2
            description: "def456"

    - name: adwords_11
      schema: adwords_02
      tables: *adwordstables

      # skip adwords_12 because it doesn't exist!

    - name: adwords_13
       schema: adwords_04
       tables: *adwordstables

brittianwarner added bug Something isn't working triage labels Jul 30, 2020

brittianwarner changed the title ~~Snowflake: Show terse schema running for all models when only calling one model~~ Snowflake: Show terse schema running for all models when only calling single model Jul 30, 2020

jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jul 31, 2020

jtcohen6 closed this as completed Jul 31, 2020

brittianwarner changed the title ~~Snowflake: Show terse schema running for all models when only calling single model~~ Snowflake: Show terse object running for all models when only calling single model Jul 31, 2020

jtcohen6 mentioned this issue Aug 19, 2020

Dynamic Model Reference and depends_on hint #2716

Closed

ajbosco mentioned this issue Jan 21, 2022

[CT-203] A single show terse call for each database dbt-labs/dbt-snowflake#83

Closed

jtcohen6 mentioned this issue Feb 5, 2022

[CT-168] Cache objects for selected resources only? #4688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowflake: Show terse object running for all models when only calling single model #2673

Snowflake: Show terse object running for all models when only calling single model #2673

brittianwarner commented Jul 30, 2020 •

edited

Loading

jtcohen6 commented Jul 31, 2020

brittianwarner commented Jul 31, 2020 •

edited

Loading

jtcohen6 commented Jul 31, 2020

brittianwarner commented Jul 31, 2020 •

edited

Loading

brittianwarner commented Aug 4, 2020

jtcohen6 commented Aug 4, 2020

Snowflake: Show terse object running for all models when only calling single model #2673

Snowflake: Show terse object running for all models when only calling single model #2673

Comments

brittianwarner commented Jul 30, 2020 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

jtcohen6 commented Jul 31, 2020

brittianwarner commented Jul 31, 2020 • edited Loading

jtcohen6 commented Jul 31, 2020

brittianwarner commented Jul 31, 2020 • edited Loading

brittianwarner commented Aug 4, 2020

jtcohen6 commented Aug 4, 2020

brittianwarner commented Jul 30, 2020 •

edited

Loading

brittianwarner commented Jul 31, 2020 •

edited

Loading

brittianwarner commented Jul 31, 2020 •

edited

Loading