Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jinja statements from vars in schema.yml #3450

Closed
jpmmcneill opened this issue Jun 9, 2021 · 9 comments
Closed

Jinja statements from vars in schema.yml #3450

jpmmcneill opened this issue Jun 9, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@jpmmcneill
Copy link
Contributor

Describe the feature

Ability to use jinja loops inside schema yaml would really improve how generic things like sources would be. For example:
Consider a situation where a package has an input of either a ref or a source, defined based on a var.

It would be nice to be able to accept a list of sources or refs that would that would be unioned together dynamically based on the entries in the var. For refs, it's completely possible at the moment but for sources, it's not possible to write jinja statements (such as if / for loops) in a schema.yml file.

ie:

sources:
  - name: my_source
    tables:
      {% for table_name in var('my_list_of_tables') %}
       - name: "{{table_name}}"
      {% endfor %}

Describe alternatives you've considered

Obviously a clear alternate is "dont allow this", which definitely works. Unfortunately, for a source to be referenced, it needs to be defined in a schema yaml so I haven't thought of any possible alternatives as of right now.

Who will this benefit?

Package developers

Are you interested in contributing this feature?

Happy to help out if I'm able to! I'm already on the hook for another issue that I've been dragging my feet on 😅

@jpmmcneill jpmmcneill added enhancement New feature or request triage labels Jun 9, 2021
@jtcohen6 jtcohen6 removed the triage label Jun 15, 2021
@jtcohen6
Copy link
Contributor

@jpmmcneill Thanks for opening this issue! It's definitely something I've seen other community members talk about before. I think there's a real use case / frustration you're trying to solve for, though I don't think Jinja-generated YAML code is going to be the way to solve it.

In fact, the thing you're after is possible today, with just a few tweaks:

# dbt_project.yml

vars:
  my_list_of_tables:
    - name: tbl_red
    - name: tbl_blue
    - name: tbl_green
# models/sources.yml
version: 2

sources:
  - name: my_source
    tables: "{{ var('my_list_of_tables') }}"
-- models/my_model.sql

{% for src in var('my_list_of_tables') %}
    select * from {{ source('my_source', src.name) }}
    {{ 'union all' if not loop.last }}
{% endfor %}

The variable my_list_of_tables is set to an array of dictionaries. That's the data structure expected by the yaml key tables, and it's also something we can loop over in the templating for our model SQL. We're not generating yaml code with Jinja, but we are able to pass the same pythonic data structure into both places.

Alternatives

There's another way to add extensibility to yaml properties, namely via anchors, if what you're after is DRYer source definitions:

sources:
  - &source-of-many-colors
    name: my_first_source
    tables:
      - name: tbl_red
      - name: tbl_blue
      - name: tbl_green
  - <<: *source-of-many-colors
    name: my_second_source

This kind of extension/inheritance is especially interesting to think about in terms of columns that are repeated across models in the DAG. There's much more discussion about this over in #2995.

In any case, I'm going to close this issue in favor of:

  • the proposed approach: reusable data structures > Jinja-generated YAML
  • the broader conversation around DRYing up yaml properties

@jpmmcneill
Copy link
Contributor Author

Holy moly, thats awesome. Thank you @jtcohen6!!!

For my own future reference, is there a good place to go in the docs to find these kind of intricacies? I'm still a relatively noob-y dbt user 😅

@ryantimjohn
Copy link

ryantimjohn commented Nov 1, 2021

@jtcohen6 I tried this but doesn't seem to be working.

I defined my variables like so:

vars:
    frakture_facebook : 
      - name: frakture_facebook
        database: bsd-vera
        schema: src_frakture
        tables:
            - name: facebook_bizman_hdx_ad_summary_by_date_pivot
            - name: facebook_bizman_hdx_message

And then tried to reference them like this:

version: 2
sources: "{{ var('frakture_facebook') }}"

But got this error:

Server error: RPC server failed to compile project, call the "status" method for compile status: Compilation Error
The schema file at models/sources/frakture_facebook.yml is invalid because the value of 'sources' is not a list

@ryantimjohn
Copy link

I know the variable is correctly formatted because when I reference it in a .sql statement like so:
{{ var('frakture_facebook')}}

I get:
[{'name': 'frakture_facebook', 'database': 'bsd-vera', 'schema': 'src_frakture', 'tables': [{'name': 'facebook_bizman_hdx_ad_summary_by_date_pivot'}, {'name': 'facebook_bizman_hdx_message'}]}]

@msenyonyi
Copy link

In fact, the thing you're after is possible today, with just a few tweaks:

# dbt_project.yml

vars:
  my_list_of_tables:
    - name: tbl_red
    - name: tbl_blue
    - name: tbl_green

@jtcohen6
Is there a way to do this dynamically? If I have a slowly-changing group of tables in a schema? For example, one day I have name: tbl_red, and the next day this table is gone? I would like to cycle through the tables in the schema and add these to the var :)

@dannybluffmsm
Copy link

Has jinja implementation been considered anymore in yml files?

I would like to make sure everything is documented correctly. If I have created a loop within my SQL code, then it would be very beneficial to be able to loop within the yml file too.

For example, it would be good to do something like the below:

# dbt_project.yml

var:
    TIME_PERIODS:
    - 7
    - 30
    - 365

# models/my_model.sql

select current_date() as current_date,
      {% for time in var('TIME_PERIODS') %}
          date_sub(current_date(), interval {{time}} day) as date_{{time}}_days_ago,
      {% endfor %}

# models/my_model.yml

version: 2
models:
  - name: my_model
    columns: 
      - name: current_date
        columns: Current date
        {% for time in var('TIME_PERIODS') %} 
      - name: date_{{time}}_days_ago
        columns: Date {{time}} days ago
        {% endfor %}
 

instead of having to create the yml file manually like so:

# models/my_model.yml

version: 2
models:
  - name: my_model
    columns: 
      - name: current_date
        columns: Current date
      - name: date_7_days_ago
        columns: Date 7 days ago
      - name: date_30_days_ago
        columns: Date 30 days ago
      - name: date_365_days_ago
        columns: Date 365 days ago 

Being able to create a loop helps massively if more variables are added (for example TIME_PERIODS in this case) and removes any possibility of undocumented variables.

@abi-mutinex
Copy link

Does above work? @dannybluffmsm

@dannybluffmsm
Copy link

@abi-mutinex I don't believe so.

@jtcohen6 are you able to assist please? Would it be best to open a new issue?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 5, 2023

I don't anticipate us adding support for Jinja-templated yaml configuration anytime soon.

There are a few mechanisms, such as yaml anchors, that can provide better ergonomics in specific cases. And there have been related discussions over the past few years imagining better ways to reuse / inherit configuration across models. (I just converted this one earlier, so it comes readily to mind: #6527)

In terms of programmatically defining model configuration, via something like a loop, though — I just don't think Jinja-on-yaml is the right mechanism for this. It's an ergonomic improvement that, longer-term, we might want to enable by means of a real Python API for defining dbt project resources. We aren't making immediate progress in that direction, exactly, but we are working on other internal APIs within dbt-core over the next few months, with the intent to (finally) publish real, public, documented APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants