Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template models for dbt #2551

Closed
rameesraja opened this issue Jun 16, 2020 · 8 comments
Closed

Template models for dbt #2551

rameesraja opened this issue Jun 16, 2020 · 8 comments
Labels
enhancement New feature or request stale Issues that have gone stale

Comments

@rameesraja
Copy link

Describe the feature

Template models which lets us pass parameters, which are replaced while deploying. We usually build aggregated models by frequencies like ('week','month','quarter','year') for our fact tables. To do this we have to build separate models for each frequency. If we can support template models, which will be let us pass frequency as a parameter it will be helpful.

Example:

{frequency}ly_sales.sql

{{
     config(frequencies = ['week','month']

}}

select 
       date_trunc({frequency}, current_timestamp()) as sales_date,
       sum(sales) as sales
from sales

this code should deploy two models in the DW, weekly and monthly repectively.

Describe alternatives you've considered

We could achieve this using macros. But I believe, these wont be included in the DAG.

Who will this benefit?

This falls under our strategy of DRY. If there is a change, we will just have to update one model.

@rameesraja rameesraja added enhancement New feature or request triage labels Jun 16, 2020
@drewbanin drewbanin removed the triage label Jun 17, 2020
@drewbanin
Copy link
Contributor

Thanks for opening this issue up @rameesraja. See also #1637 -- I think there's a lot of overlap between these two proposals!

@aodhan-domhnaill
Copy link

@drewbanin Can you suggest an architecture for this? I can contribute this feature.

My suggestion would be to add a new field into the dbt_project.yml that lets you specify the path of the model. Then you can reuse existing models. After that, simply add support for more params in the models.

@z3z1ma
Copy link
Contributor

z3z1ma commented Oct 6, 2021 via email

@aodhan-domhnaill
Copy link

@z3z1ma for the purposes of this, yes, but I'm #3469 and some other issues there are requests for table sharding. I'm also looking to do sharding and this seemed like a good step in that direction.

For example, with these model templates, you could already generate multiple models from a single model spec. The next step would be to allow DBT to fetch all the values in a column and generate all the models for each column value.

In my mind, I saw this as a step towards sharding.

@z3z1ma
Copy link
Contributor

z3z1ma commented Oct 7, 2021

@aidan-plenert-macdonald

What if we used macros to generate the files based on a template. Running that to generate the shards as model files. I feel the hurdle would be significantly smaller until a more embedded solution like model blocks is revisited.
It's something that could be done and work really well I imagine with a fraction of the initial architectural and code refactoring investment.

@z3z1ma
Copy link
Contributor

z3z1ma commented Oct 7, 2021

In airflow, a step that build 3 shards from a single DRY template which lives in dbt as a macro

dbt --run-operation shard_1 '{"shard": "jan"}' >> jan_data.sql
dbt --run-operation shard_1 '{"shard": "feb"}' >> feb_data.sql
dbt --run-operation shard_1 '{"shard": "mar"}' >> mar_data.sql 

here you edit a single file to update all of these yet they still sit in the current design as independent manifested model files.

alternatively, this might work

for i in $(dbt --run-operation get_shards); do
    dbt --run-operation shard_1 '{"shard": "$i"}'  >> $i_data.sql
done

where get_shards.sql is a macro that logs unique values from a table to be used in generating shard models

Not optimal, sure. Cool though.
Would also have to use this {% raw %} {{ ref('data_table_to_shard' }} {% endraw %} in the macro so when it pipes, it retains ref

@aodhan-domhnaill
Copy link

@z3z1ma Part of the problem is that we don't know what the shards are before hand. I was hoping to use get_column_values to compute the different shards.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

4 participants