Seed blocks #2276

drewbanin · 2020-03-31T17:01:42Z

Describe the feature

Let's wrap seeds (optionally) inside of blocks. Example implementation:

{% seed my_seed_name %}

{{ config(...) }}

col1,col2,col3
abc,def,ghi

{% endseed %}

Some rough implementation thoughts:

This tag should definitely be optional. Wrapping a csv file in a jinja tag is pretty weird to do!
Supporting seed configs inline would be really nice!
I could imagine supporting non-csv seeds in the future, eg:

{% seed my_json_seed type=json %}

{"col1": "value", "col2": "other value"},
{"col1": "value2", "col2": "other value2"},

{% endseed %}

That is certainly out of scope for this change, but a block here gives us a nucleation point to build new and compelling things on top of in the future.

The text was updated successfully, but these errors were encountered:

sumanau7 · 2020-04-07T12:59:46Z

@drewbanin can you share the purpose for doing this ?

{% seed my_seed_name %}

col1,col2,col3
abc,def,ghi

{% endseed %}

Is the above required so that more than one seed data can be added in one file, for eg:

{% seed my_seed_name %}

col1,col2,col3
abc,def,ghi

{% endseed %}

{% seed my_seed_name_1 %}

col4,col5,col6
abc,def,ghi

{% endseed %}

And does the {{ config(...) }} denote the same configuration as explained in this doc https://docs.getdbt.com/reference/seed-configs/

drewbanin · 2020-04-08T18:46:48Z

hey @sumanau7 - yep, that's exactly right! These seed tags would be optional, but if they're provided, they would support:

multiple seed blocks in a single file
configuration of seeds inline (as opposed to in the dbt_project.yml file)
seed blocks defined outside of .csv files (eg. next to a model defined with a block: model blocks #184 )

The configs we'd support are indeed the ones you've linked to!

sumanau7 · 2020-04-09T13:41:30Z

@drewbanin Adding jinja tags inside csv files might get weird, few issues that i can see of is:

User can dump data in csv using their tools but those files after that have to be manually updated and jinja tags have to be added.
Let's say there is some formatting issue in csv file now due to jinja tags inside it, user's won't be able to open these files in their tools for eg: Google sheets, Numbers, Excel etc to properly figure out the issue.
Even Github won't be able to format these csv files properly for display purposes because of unknown tags.

There can be many other benefits also of this approach which of course you would know better.

As an alternative may i suggest the following:

Let user define seed data in multiple csv files.

data/file_1.csv
data/file_2.csv
data/file_3.json

In their data folder they can now define a jinja template seed_template.jinja2 which has content like

{% seed my_seed_name file=file1.csv %}
{{ config(...) }}
{% endseed %}

{% seed my_seed_name_1 file=file2.csv %}
{{ config(...) }}
{% endseed %}

{% seed my_seed_name_2 file=file3.json type=json %}
{{ config(...) }}
{% endseed %}

Of course you would be the best person to take this call.

drewbanin · 2020-04-09T19:29:12Z

You make some really good points here @sumanau7! I buy it! Let's still move forwards with a seed block, but let's consider separating the CSV file contents from the block itself :)

sumanau7 · 2020-04-10T15:06:56Z

@drewbanin Interested to pick this up.
I did spend some time in understanding how seeds are getting processed. So if we have to do what we discussed above does the below approach sounds good:

Define dataclass to process seed config, just like a Project class exists in core/dbt/config.project
Make sure RuntimeConfig gets this class.
While processing seed data check if SeedConfig is specified use that else fallback to seed config defined in Project.

Your thoughts on implementation would help here.

jml · 2020-04-29T14:17:34Z

Came here from #2365. Seems like seed blocks are orthogonal to supporting other seed types? For example, you could configure the seed source format to be either JSON or CSV in the dbm_project.yml without having to implement blocks as described here.

jtcohen6 · 2020-09-10T18:56:35Z

@drewbanin I think we should close this, or at least kick it out of v1.0.

Seed blocks get us:

The ability to define multiple seeds in the same file
The ability to config() seeds inline

The latter is the more compelling of the two by far, and indeed requiring that all seed configs be specified in dbt_project.yml is not sustainable for especially large projects. I think we can better resolve this by enabling the specification of configs in seeds/whatever.yml (#2401).

If we do take this off our roadmap, I think we should reopen #2365, which is compelling in its own right.

drewbanin · 2020-09-10T19:00:24Z

@jtcohen6 i'm with you! Let's cose this issue, prioritize #2274 and re-open #2365 - sounds like that will get us to a good place :D

drewbanin added enhancement New feature or request 1.0.0 Issues related to the 1.0.0 release of dbt labels Mar 31, 2020

drewbanin mentioned this issue Mar 31, 2020

Milestone: 1.0 #2277

Closed

11 tasks

drewbanin mentioned this issue Apr 29, 2020

Support newline-delimited JSON for seeds #2365

Open

jtcohen6 removed the 1.0.0 Issues related to the 1.0.0 release of dbt label Sep 10, 2020

jtcohen6 closed this as completed Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seed blocks #2276

Seed blocks #2276

drewbanin commented Mar 31, 2020

sumanau7 commented Apr 7, 2020 •

edited

Loading

drewbanin commented Apr 8, 2020

sumanau7 commented Apr 9, 2020 •

edited

Loading

drewbanin commented Apr 9, 2020

sumanau7 commented Apr 10, 2020

jml commented Apr 29, 2020

jtcohen6 commented Sep 10, 2020

drewbanin commented Sep 10, 2020

Seed blocks #2276

Seed blocks #2276

Comments

drewbanin commented Mar 31, 2020

Describe the feature

sumanau7 commented Apr 7, 2020 • edited Loading

drewbanin commented Apr 8, 2020

sumanau7 commented Apr 9, 2020 • edited Loading

drewbanin commented Apr 9, 2020

sumanau7 commented Apr 10, 2020

jml commented Apr 29, 2020

jtcohen6 commented Sep 10, 2020

drewbanin commented Sep 10, 2020

sumanau7 commented Apr 7, 2020 •

edited

Loading

sumanau7 commented Apr 9, 2020 •

edited

Loading