Dynamic Model Reference and depends_on hint #2716

brittianwarner · 2020-08-19T00:21:43Z

Describe the feature

I want to be able to dynamically union models depending on runtime variables. For example, we have Customers A, B, C. Customer A has models 1,2,3. Customer B has models 1,2. Customer C has models 2,3. I need to be able to dynamically union these models together, so when I call dbt run for Customer A I want to union models 1,2,3; Customer B union models 1,2; and Customer C union models 2,3. I am currently running into an issue where DBT is asking for a depends_on hint. It is important that I can dynamically change the dependencies on a model given variables. Open to any suggestions/alternatives!

dbt was unable to infer all dependencies for the model "combined_performance".
This typically happens when ref() is placed within a conditional block.

To fix this, add the following hint to the top of the model "combined_performance":

-- depends_on: {{ ref('stg_google_performance') }}

> in macro get_ad_relation (macros/get_customer_attributes.sql)
> called by model combined_performance (models/ads/combined_performance.sql)
> called by model combined_performance (models/ads/combined_performance.sql)

Example of dbt run:
dbt run --vars '{"Customer":"A"}'
dbt run --vars '{"Customer":"B"}'
dbt run --vars '{"Customer":"C"}'

Describe alternatives you've considered

None

Additional context

Using Snowflake.

Who will this benefit?

Anyone who is building a dbt model for many customers/variations.

Are you interested in contributing this feature?

Yes

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-08-19T15:02:04Z

Thanks for the walkthrough @brittianwarner. I don't think this is going to be possible today. dbt needs to be able to construct the dependency graph without stepping through Jinja conditional statements, hence the -- depends_on hint. While we may someday do the work to support single-pass Jinja rendering, that's a massive undertaking and we'll track progress there separately.

In the meantime, here are the options I see:

Changing your data model: In general, I recommend creating unioned versions of models 1, 2, and 3 that include data for all customers. This saves a lot of duplicated transformation logic, and the complexity of separate var-driven invocations. Then, you filter at the very last step by the customer that you need in the moment. By that point, you likely don't need a dbt var; it would be much more powerful as a BI parameter, or logic encoded into a data sharing layer.
Placeholder tables: Create empty source tables + models for customers who do not have a given dataset. This could be as simple as

create customer_b_source.google_performance as (select * from customer_a_source.google_performance limit 0);

I know I recommended something similar in #2673.

Does either of those make sense in your case?

brittianwarner · 2020-08-19T18:29:41Z

Thanks, @jtcohen6! Unfortunately, it would be a huge headache to build different versions of the model for each Customer as we could in theory have 20 models and we would have to support any combination of those depending on the customer. Let me think a bit more about your second recommendation and I will get back to you.

In general, I feel it would be extremely powerful to be able to set dependencies based on a set of variables at runtime. Our use case involves having a dbt job for 100+ customers that all have their own "gotchas" so having this flexibility would be awesome.

Another idea... Is there a way to manually add a dependency via a macro or something?

jtcohen6 · 2020-08-19T18:57:48Z

Is there a way to manually add a dependency via a macro or something?

What have you got in mind?

brittianwarner · 2020-08-19T19:17:21Z

I haven't fully thought through it, but it would be nice to insert/update/delete dependencies before any models execute. So I am picturing some sort of pre-run operation where I can change a dependency based on a variable or lookup table. So for my example, I would pass a customer_id to 'dbt run' and then I would have a macro that would lookup configuration data and set dependencies on the fly. Thoughts?

brittianwarner · 2020-08-19T21:20:41Z

I think I may have figured it out... I went ahead and added all of the depends_on hints at the top of the model and built a macro to build the sql needed... Here is an example:

{% macro get_relation()  %}
{% set sql %}
select distinct 
	concat('stg_',replace(model,'-','_'),'_performance')   as model_type
from model_lookup a 
where a.model ilike any ('model1','model2','model3','model4','model5')
and customer_id = {{ var('customer_id') }}
{% endset %}

{% set relations = run_query(sql) %}

{% set relations_list = [] %}
{% if execute %}
{% for relation in relations.columns[0].values() %}
  select 
      *
  from {{ ref(relation) }}
{% if not loop.last -%} union all {%- endif %}
{% endfor %}
{% endif %}

{% endmacro %}

{{ 
	config(materialized='table') 
}}
-- depends_on: {{ ref('stg_model1_performance') }}
-- depends_on: {{ ref('stg_model2_performance') }}
-- depends_on: {{ ref('stg_model3_performance') }}
-- depends_on: {{ ref('stg_model4_performance') }}
-- depends_on: {{ ref('stg_model5_performance') }}

{{ get_relation()}}

brittianwarner · 2020-08-20T14:18:39Z

@jtcohen6 Making sure you saw this.

jtcohen6 · 2020-08-20T18:20:49Z

That's a neat approach! I think the critical piece is that you're including the following, outside of any conditional logic:

-- depends_on: {{ ref('stg_model1_performance') }}
-- depends_on: {{ ref('stg_model2_performance') }}
-- depends_on: {{ ref('stg_model3_performance') }}
-- depends_on: {{ ref('stg_model4_performance') }}
-- depends_on: {{ ref('stg_model5_performance') }}

Even if you don't end up selecting from all these models each time, dbt is able to build the dependency graph and establish them as parents (well, potential parents). Then, your macro can tell you which ones you actually need for a given customer.

The database lookup feels like an unnecessary complexity, though that depends on how many potential customer-models variations you have and how often those mappings change. If the number of mappings is relatively stable and limited in number, I'd think about an approach like:

{% macro customer_model_mapping()  %}

  {% set mappings = [
    {"customer": "A", "model_types": ('model1', 'model2', 'model3')},
    {"customer": "B", "model_types": ('model1', 'model2')},
    {"customer": "C", "model_types": ('model2', 'model3')} 
  ] %}

  {% do return(mappings) %}

{% endmacro %}

{% macro get_models_for_customer(customer_id) %}

  {% set customer_dict = customer_model_mapping()|selectattr("customer", customer_id) %}
  {% set models_for_customer = customer_dict['model_types'] %}

  {% do return(models_for_customer) %}

{% endmacro %}

Whether you encode the variable into your macros, or pass it as an argument from your model, is totally up to you:

{{ 
	config(materialized='table') 
}}
-- depends_on: {{ ref('stg_model1_performance') }}
-- depends_on: {{ ref('stg_model2_performance') }}
-- depends_on: {{ ref('stg_model3_performance') }}
-- depends_on: {{ ref('stg_model4_performance') }}
-- depends_on: {{ ref('stg_model5_performance') }}

{% set model_list = get_models_for_customer(var('customer_id')) %}

{% for model_name in model_list %}
{% set relation = `stg_` ~ model_name ~ 'performance' %}

  select 
      *
  from {{ ref(relation) }}
{% if not loop.last -%} union all {%- endif %}
{% endfor %}
{% endif %}

I'm going to close this issue. It's a complex problem, and there are (a number of) complex approaches already possible today :)

john-joseph-noonan · 2022-12-26T09:05:03Z

Hi folk. I'm still facing this challenge.

I need to be able to dynamically generate a list of depends_on models as my use-case scales rather quickly.

Does anyone have any updates on better ways to do this?

ivoalbrecht · 2023-01-19T13:08:20Z

@john-joseph-noonan

Did you find a solution?

I have a model where I union all models that start with some_prefix dynamically:

WITH unioned_messages AS (

    {% for model in graph.nodes.values() if model.name.startswith('some_prefix_') %}
        SELECT * FROM {{ref(model.name)}}
        {% if not loop.last %} 
        UNION ALL
        {% endif %}
    {% endfor %}

)

You can force dependencies, but I haven't found a way to do that dynamically as well

@jtcohen6 Perhaps you know how to approach that?

john-joseph-noonan · 2023-01-20T20:58:38Z

@john-joseph-noonan

Did you find a solution?

I have a model where I union all models that start with some_prefix dynamically:
WITH unioned_messages AS (

    {% for model in graph.nodes.values() if model.name.startswith('some_prefix_') %}
        SELECT * FROM {{ref(model.name)}}
        {% if not loop.last %} 
        UNION ALL
        {% endif %}
    {% endfor %}

)
You can force dependencies, but I haven't found a way to do that dynamically as well

@jtcohen6 Perhaps you know how to approach that?

Unfortunately, it seems like it's not possible. I searched through the dbt Slack channels and every arrives on the same conclusion, that it's not yet possible.

ivoalbrecht · 2023-01-23T09:47:12Z

@john-joseph-noonan
Did you find a solution?
I have a model where I union all models that start with some_prefix dynamically:
WITH unioned_messages AS (

    {% for model in graph.nodes.values() if model.name.startswith('some_prefix_') %}
        SELECT * FROM {{ref(model.name)}}
        {% if not loop.last %} 
        UNION ALL
        {% endif %}
    {% endfor %}

)
You can force dependencies, but I haven't found a way to do that dynamically as well
@jtcohen6 Perhaps you know how to approach that?
Unfortunately, it seems like it's not possible. I searched through the dbt Slack channels and every arrives on the same conclusion, that it's not yet possible.

Thanks

I did not solve it either, I used a workaround where I do not use the ref function and reference the models that finish the latest in the run via -- depends_on:.

This makes sure that the model runs after the unioned models (at least for now and likely as well if new models are added)

stevenconnorg · 2023-02-03T18:15:42Z

I found a way around this by adding this to the top of my file to dynamically generate the depends_on comment:


{% for model in[<list_of_model_names] %}
{% set depends_on = "--depends_on: {{  ref( '" ~  model ~ "' )  }}" %}
{{  depends_on  }}
{% endfor %}

compiled as:


--depends_on: {{  ref( '<model name 1>' )  }}


--depends_on: {{  ref( '<model name 2>' )  }}


--depends_on: {{  ref( '<model name 3>' )  }}

It's not the prettiest but it works

NicolasPA · 2023-03-14T16:38:10Z

I have a similar issue with a dynamic dependency, but in a generic test that takes as a parameter a macro referring to other tables.
The error is triggered by the dependency on those tables.

This adds the complexity that graph.nodes doesn't exist at compilation but does at execution, so I have to add a condition for that to not fail at test compilation.

{% test is_count_inferior_to_all_N_M_versions(model, all_versions_macro_name) %}

{% if execute %}
    {% for model in graph.nodes.values() if '_all_versions_v' in model.name %}
        {{ log("-- depends_on: {{ ref('" + model.name + "') }}") }}
        {% set depends_on = "-- depends_on: {{ ref( '" + model.name + "' ) }}" %}
        {{ depends_on }}
    {% endfor %}
{% endif %}

with all_versions as (

    select
        1 as pk,
        count(*) as all_versions_count
    from (
        -- We can't pass a macro directly in the YML so we get it from the context, which only exists during execution
        {% if execute %}
            {% set all_versions_macro = context.get(all_versions_macro_name) %}
            {{ all_versions_macro() }}
        {% else %}
            select 1  -- default value for compile to succeed outside of execution
        {% endif %}
    ) as a
    where a.action_type in ('N', 'M')

),

last_version as (

    select
        1 as pk,
        count(*) as last_version_count
    from {{ model }}

)

select
    all_versions_count,
    last_version_count,
    all_versions_count - last_version_count as diff
from last_version
inner join all_versions
    on last_version.pk = all_versions.pk
where last_version_count > all_versions_count

{% endtest %}

Sadly, it doesn't work. I still get the compilation error.

I know the string generation is correct because if I simply copy-past what the lines printed by the log function and add them to my test definition, it works. But I don't want to do that because the concerned models will evolve in the future, and I don't want to have to change those comments manually every time (and probably forget about it at some point).

The macro call also uses a context trick, but it works fine.

jelstongreen · 2024-07-23T08:05:52Z

We've run into this issue again and again over the last few years - dynamic dependency seems like such a useful feature, can we reopen this please?

brittianwarner added enhancement New feature or request triage labels Aug 19, 2020

jtcohen6 removed the triage label Aug 19, 2020

jtcohen6 closed this as completed Aug 20, 2020

morsapaes mentioned this issue Jul 16, 2022

dbt-materialize: add macro for CREATE VIEWS step in the Postgres source MaterializeInc/materialize#13661

Closed

lewisarmistead mentioned this issue Mar 13, 2023

Feature/fact table model unions entralliance/dbt-entr#5

Merged

jbandoro mentioned this issue Nov 30, 2023

Expose environment variables and dbt vars configuration via ProjectConfig astronomer/astronomer-cosmos#712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Model Reference and depends_on hint #2716

Dynamic Model Reference and depends_on hint #2716

brittianwarner commented Aug 19, 2020 •

edited

Loading

jtcohen6 commented Aug 19, 2020 •

edited

Loading

brittianwarner commented Aug 19, 2020

jtcohen6 commented Aug 19, 2020

brittianwarner commented Aug 19, 2020

brittianwarner commented Aug 19, 2020 •

edited

Loading

brittianwarner commented Aug 20, 2020

jtcohen6 commented Aug 20, 2020 •

edited

Loading

john-joseph-noonan commented Dec 26, 2022

ivoalbrecht commented Jan 19, 2023

john-joseph-noonan commented Jan 20, 2023

ivoalbrecht commented Jan 23, 2023

stevenconnorg commented Feb 3, 2023 •

edited

Loading

NicolasPA commented Mar 14, 2023 •

edited

Loading

jelstongreen commented Jul 23, 2024

Dynamic Model Reference and depends_on hint #2716

Dynamic Model Reference and depends_on hint #2716

Comments

brittianwarner commented Aug 19, 2020 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Aug 19, 2020 • edited Loading

brittianwarner commented Aug 19, 2020

jtcohen6 commented Aug 19, 2020

brittianwarner commented Aug 19, 2020

brittianwarner commented Aug 19, 2020 • edited Loading

brittianwarner commented Aug 20, 2020

jtcohen6 commented Aug 20, 2020 • edited Loading

john-joseph-noonan commented Dec 26, 2022

ivoalbrecht commented Jan 19, 2023

john-joseph-noonan commented Jan 20, 2023

ivoalbrecht commented Jan 23, 2023

stevenconnorg commented Feb 3, 2023 • edited Loading

NicolasPA commented Mar 14, 2023 • edited Loading

jelstongreen commented Jul 23, 2024

brittianwarner commented Aug 19, 2020 •

edited

Loading

jtcohen6 commented Aug 19, 2020 •

edited

Loading

brittianwarner commented Aug 19, 2020 •

edited

Loading

jtcohen6 commented Aug 20, 2020 •

edited

Loading

stevenconnorg commented Feb 3, 2023 •

edited

Loading

NicolasPA commented Mar 14, 2023 •

edited

Loading