DAGnabit #9

drewbanin · 2016-03-21T21:36:39Z

Walk the DAG inferred form SQL select statements. Run CREATE statements in the appropriate order.

I tested this with the models repo, but unit tests are definitely a good idea

Additionally, models can be individually configured. The defaults are

materalized: false,
enabled: true

These configs can be overridden in the dbt_project.yml file.

models:
  base:
    materialized: false,
    enabled: true
  pardot:
    enabled: false
    pardot_visitoractivity:
      enabled: true
      materialized: true

This will result in all models outside of Pardot being created as views. Only the pardot_visitoractivity model will be created (as a table) within the Pardot directory.

drewbanin · 2016-03-29T23:09:48Z

cc @jthandy

jthandy · 2016-03-30T00:29:40Z

I'm a big fan of this approach. Agree that it could use refinement but this will significantly help my workflow; I had anticipated config for view/table to come later so that's great.

@cmerrick we're hoping to have the launch completed by the time you head to SF next week. do you anticipate any challenges working through code review within the next couple of days? if so, no prob but might try to push it to someone else.

cmerrick · 2016-03-30T17:18:26Z

dbt/task/compiler/linker.py

@@ -0,0 +1,93 @@
+import sqlparse


the choice of namespace here is confusing - it's under the compiler namespace, but it's used by the run task? I recommend just moving this into run.py for now

@cmerrick I imagine we're going to want to do much more in the compilation step than we're currently doing. For instance, sort/dist keys, column encodings, etc can all be interpolated at compile time. I think it makes sense to leave the actual sql parsing code in a separate dir for now. I did change the name from compile to build

That sounds like speculative design. We can always break it off into its own file later, if we decide we need to.

change print statements, function calls, imports

drewbanin · 2016-04-01T16:44:30Z

@cmerrick is consolidating the linker.py code into the run.py file the only thing standing in the way of getting this merged?

…nabit

drewbanin · 2016-04-01T22:16:19Z

Cool, will revise asap

Sent from my iPhone

On Apr 1, 2016, at 6:14 PM, Christopher Merrick notifications@github.com wrote:

I agree it's not pretty, but I think this approach has structural issues

On Fri, Apr 1, 2016 at 6:09 PM, Drew Banin notifications@github.com wrote:

The alternative is putting a big config blob in the dbt_project.yml file.
Do you think that's preferable?

Sent from my iPhone

On Apr 1, 2016, at 5:57 PM, Christopher Merrick <
notifications@github.com> wrote:

I don't think we should mix configuration like the enabled and
materialized flags into the models directories. The models are source code,
those parameters are essentially compilation flags.

In a world where models can be packaged up and utilized by another
party, we don't want that user to have to modify anything in the source
directory. It would be like having to configure ruby on rails by modifying
its source code directly.

—
You are receiving this because you authored the thread.

Reply to this email directly or view it on GitHub

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#9 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

cmerrick · 2016-04-01T22:29:20Z

dbt/task/run.py

+    def link(self, sql):
+        sql = sql.strip()
+        for statement in sqlparse.parse(sql):
+            if statement.get_type().startswith('CREATE'):


so this is expecting the model to be specified as create view as... or create table as... and then re-writes that create statement if necessary? I see that's a convenient way to allow the user to specify the name of the thing, but I think it will either break or produce surprising results if anything other than the most generic forms for create table/view as... are used. For example, what's going to happen if someone writes create temporary table as ...?

This code runs after the compilation step. Input to the system should be a SELECT statement. This is compiled to a CREATE statement, which is then run. I suppose this code is redundant, but it was originally intended to avoid running destructive SQL unintentionally

There is a single select statement per file, and the table/view name is inferred from the file name containing the SELECT statement

Sent from my iPhone

On Apr 1, 2016, at 6:29 PM, Christopher Merrick notifications@github.com wrote:

In dbt/task/run.py:

order = nx.topological_sort(self.graph, reverse=True)

for node in order:

if node in self.node_sql_map: # TODO :

yield (node, self.node_sql_map[node])

else:

pass

def register(self, node, sql):

if node in self.node_sql_map:

raise RuntimeError("multiple declarations of node: {}".format(node))

self.node_sql_map[node] = sql

def link(self, sql):

sql = sql.strip()

for statement in sqlparse.parse(sql):

if statement.get_type().startswith('CREATE'):
so this is expecting the model to be specified as create view as... or create table as... and then re-writes that create statement if necessary? I see that's a convenient way to allow the user to specify the name of the thing, but I think it will either break or produce surprising results if anything other than the most generic forms for create table/view as... are used. For example, what's going to happen if someone writes create temporary table as ...?

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

ah ok, the simultaneous introduction of dependency-graphing and create-wrapping got me mixed up. That sounds right. In the future it would be good to provide that blurb when you create the PR, it's useful for understanding the approach.

cmerrick · 2016-04-01T22:46:46Z

back to you

drewbanin · 2016-04-03T18:54:25Z

@cmerrick model configs are now be specified like this:

models:
  base:
    enabled: true
    materialized: false

  pardot:
    materialized: true
    pardot_visitoractivity:
      materialized: false

The precedence goes model, model group, base

By default, all models are enabled and are not materialized

cmerrick · 2016-04-03T20:00:13Z

dbt/task/compile.py


-        return model_config
+        config = model_configs['base'].copy()


I don't like that "base" is a magic word here, it means a model/model group can't be called that. Some alternative proposals:

Don't allow this to be set hierarchically. Either only allow a global setting, or only allow per-model settings, but not both.

Use regexps to specify the models to apply the settings to. In that case, "base" would become "*".

Use two different keys in the parent map - e.g. "default_model_settings" that is equivalent to "base", and "model_settings" that is an array of per-model settings

cmerrick · 2016-04-03T20:03:13Z

The precedence goes model, model group, base

What's a model group? Could there be a conflict between a model group name and the model name?

drewbanin · 2016-04-03T20:07:16Z

@cmerrick we don't have a good name for this concept -- A model group is something like "pardot". The actual model is "pardot_visitoractivity", for instance. I think we threw around the name of "module" or "service", but I don't think either of those is descriptive.

Name conflicts should be avoided for clarity (and it might break something else...?) but I think this config isn't affected.

For example

models:
  pardot:
    enabled: false
    pardot:
      enabled: true

Should work fine. models/pardot/*.sql will be disabled, but models/pardot/pardot.sql will be enabled (because it overwrites it's parent).

I think it's a moot point, because I can't foresee a situation in which the directory (model group) name is the the same as the model name. We should clarify by convention (or in code) that you shouldn't do this

cmerrick · 2016-04-03T20:09:59Z

So a model group is essentially a namespace? Could you have more deeply nested namespaces e.g. pardot.email.opens or something like that?

drewbanin · 2016-04-03T20:32:46Z

@cmerrick yes indeed, a model group is a namespace. Right now, the namespace can't contain any subdirectories. I can foresee that being useful, but it's not implemented in a way that supports that currently

cmerrick · 2016-04-03T21:18:48Z

approved

add star macro

first cut

cccbba8

drewbanin changed the title ~~first cut~~ DAGnabit Mar 23, 2016

drewbanin mentioned this pull request Mar 23, 2016

Need dependency resolution #3

Closed

drewbanin modified the milestone: v0.1.0 release Mar 23, 2016

Drew Banin added 9 commits March 25, 2016 16:36

wrap select stmts in create or replace (table|view)

04b0d5e

Merge branch 'turntable' into dagnabit

16001d6

better parsing and drop schema before create

c0d5295

handle CTEs and window functions

7649f6e

remove debug code

ce530a7

fix duplicate 'view' keyword in compile stmt

fa9e4c6

code cleanup

d8d9e25

default to models

64bb3db

each model gets its own config

cf588ed

drewbanin assigned cmerrick Mar 29, 2016

Drew Banin added 2 commits March 29, 2016 19:05

cleanup

b3c19da

Merge branch 'per-model-configs' into dagnabit

1e1180a

jthandy mentioned this pull request Mar 30, 2016

dbt run is missing output #2

Closed

cmerrick reviewed Mar 30, 2016
View reviewed changes

Drew Banin and others added 6 commits March 31, 2016 14:28

rename compiler --> linker

a90d3a1

drop existing tables/views (instead of whole schema)

79f6fb5

remove argparse dep

c041317

change print statements, function calls, imports

be1231a

always cascade table drop

7faac11

Merge pull request #14 from analyst-collective/python3-support

e762fb1

change print statements, function calls, imports

Drew Banin added 2 commits April 1, 2016 13:37

move linker code into run task

216b2f8

Merge branch 'dagnabit' of github.com:analyst-collective/dbt into dag…

fa46b8e

…nabit

cmerrick reviewed Apr 1, 2016
View reviewed changes

cmerrick assigned drewbanin and unassigned cmerrick Apr 1, 2016

model configs come from dbt_project.yml file

42d500a

Drew Banin added 2 commits April 3, 2016 14:55

skip disabled models in compile step

f3a7b70

add model config docs to sample.dbt_project.yml

0a66357

drewbanin assigned cmerrick and unassigned drewbanin Apr 3, 2016

cmerrick reviewed Apr 3, 2016
View reviewed changes

cmerrick assigned drewbanin and unassigned cmerrick Apr 3, 2016

don't use 'base' as a magic keyword

4bb43a4

drewbanin assigned cmerrick and unassigned drewbanin Apr 3, 2016

tweak example config file

d7a2ba4

cmerrick assigned drewbanin and unassigned cmerrick Apr 3, 2016

drewbanin merged commit 58a06c7 into master Apr 3, 2016

jthandy deleted the dagnabit branch August 7, 2016 20:57

jthandy restored the dagnabit branch August 7, 2016 20:57

jthandy deleted the dagnabit branch August 7, 2016 20:57

yu-iskw pushed a commit to yu-iskw/dbt that referenced this pull request Aug 17, 2021

Merge pull request dbt-labs#9 from fishtown-analytics/star

1da4081

add star macro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAGnabit #9

DAGnabit #9

drewbanin commented Mar 21, 2016

drewbanin commented Mar 29, 2016

jthandy commented Mar 30, 2016

cmerrick Mar 30, 2016

drewbanin Mar 31, 2016

cmerrick Mar 31, 2016

drewbanin commented Apr 1, 2016

drewbanin commented Apr 1, 2016

cmerrick Apr 1, 2016

drewbanin Apr 1, 2016

cmerrick Apr 1, 2016

cmerrick commented Apr 1, 2016

drewbanin commented Apr 3, 2016

cmerrick Apr 3, 2016

cmerrick commented Apr 3, 2016

drewbanin commented Apr 3, 2016

cmerrick commented Apr 3, 2016

drewbanin commented Apr 3, 2016

cmerrick commented Apr 3, 2016

DAGnabit #9

DAGnabit #9

Conversation

drewbanin commented Mar 21, 2016

drewbanin commented Mar 29, 2016

jthandy commented Mar 30, 2016

cmerrick Mar 30, 2016

Choose a reason for hiding this comment

drewbanin Mar 31, 2016

Choose a reason for hiding this comment

cmerrick Mar 31, 2016

Choose a reason for hiding this comment

drewbanin commented Apr 1, 2016

drewbanin commented Apr 1, 2016

cmerrick Apr 1, 2016

Choose a reason for hiding this comment

drewbanin Apr 1, 2016

Choose a reason for hiding this comment

cmerrick Apr 1, 2016

Choose a reason for hiding this comment

cmerrick commented Apr 1, 2016

drewbanin commented Apr 3, 2016

cmerrick Apr 3, 2016

Choose a reason for hiding this comment

cmerrick commented Apr 3, 2016

drewbanin commented Apr 3, 2016

cmerrick commented Apr 3, 2016

drewbanin commented Apr 3, 2016

cmerrick commented Apr 3, 2016