-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake: Show terse object running for all models when only calling single model #2673
Comments
Hey @brittianwarner, this is very much our intended behavior. I'll do my best to explain below. That said, I'm happy to keep the conversation going if you have other ideas. dbt: At the beginning of each run, dbt caches information from the database to know about all the objects that exist in any of the databases + schemas where dbt plans to create objects. This is the most efficient and straightforward way to grab the information once, at the start of the run, rather than (as very old versions of dbt used to) querying metadata tables before every single model run. There isn't really a difference when you're only running one model, but when you're running many, it's huge. Snowflake: Earlier this year, we changed from querying the |
@jtcohen6 , thanks for the awesome explanation. Makes sense. The biggest reason I brought this up was because we are passing variables to tell the model which schema to look at. So for the example screenshot above you can see that there are two queries that failed because those schemas don't exist. Though, this doesn't really break anything, we are technically going to be running queries that are failing each time we execute DBT. Given this context, I will leave it up to your team to decide whether this is a big deal or not. |
Got it, and appreciate the context. If you're materializing models in custom schemas, dbt should be trying to create those custom schemas if they don't already exist—is that different from what's happening on your end? |
The variables for the schemas are passed so that the package/model knows which source and target schema to use for a specific query. In our case (using the screenshot above), we pass a number for these variables (ex: --vars '{"src_schema": "64", "target_schema": "64"}) so when we kick off a specific model where we know a schema exists (ex: linkedin), dbt is running the 'show terse objects' for all models, however the only model where 64 exists is linkedin. |
@jtcohen6 Hope you had a good weekend. Just following up on this and making sure my last message made sense? Let me know if you need more info on my end. |
Hey @brittianwarner, I think that makes sense. If I follow correctly, you have a single In an another recent issue, I wrote a little about the opinionated principles and expectations that underly dbt's relationship with the database. When dbt compiles a project, it expects all the resources defined in that project to have a sensible working relationship with the state of the database. It expects source schemas to already exist, and to have permissions to grab metadata about them, in the same way that it expects to have permissions to create models it knows about (and the schemas for those models). These abstractions, and their baked-in assumptions, get us quite far 98% of the time. dbt does not try to cleverly account for source schemas that may or may not exist. Last Friday, Claire published a discourse post that address this problem—with the intended audience of package creators, who may wish to write code in expectation of these edge cases. All of that said, if I were you, I'd think about:
version: 2
sources:
- name: adwords_
schema: "{{ 'adwords_' ~ var('src_schema') }}"
enabled: "{{ if var('src_schema' not in ('12', '25', ...) }}" # known subset missing adwords data
database: edw_eng
tables:
- name: table1
description: "abc123"
- name: table2
description: "def456"
- name: adwords_02
schema: adwords_02
tables: *adwordstables
# skip adwords_03 because it doesn't exist!
- name: adwords_04
schema: adwords_04
tables: *adwordstables
version: 2
sources:
- name: adwords_10
schema: adwords_01
database: edw_eng
tables: &adwordstables
- name: table1
description: "abc123"
- name: table2
description: "def456"
- name: adwords_11
schema: adwords_02
tables: *adwordstables
# skip adwords_12 because it doesn't exist!
- name: adwords_13
schema: adwords_04
tables: *adwordstables |
Describe the bug
When only triggering one model via --model <my_model_name> it looks like this is triggering the "show terse objects" for all models where I would expect it only to run for the model specified and all dependencies.
Steps To Reproduce
Trigger DBT Run for one model in a project with multiple models/packages
Expected behavior
I would expect show terse object to only occur for the selected model and all dependencies
Screenshots and log output
dbt run --profiles-dir . --vars '{"src_schema": "64", "target_schema": "64"}' --model linkedin
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
linux
**The output of
python --version
:3.8Additional context
All of my code is still working but it seems like a waste to run this query for all models when only a single model is specified
The text was updated successfully, but these errors were encountered: