-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAGnabit #9
Conversation
cc @jthandy |
I'm a big fan of this approach. Agree that it could use refinement but this will significantly help my workflow; I had anticipated config for view/table to come later so that's great. @cmerrick we're hoping to have the launch completed by the time you head to SF next week. do you anticipate any challenges working through code review within the next couple of days? if so, no prob but might try to push it to someone else. |
@@ -0,0 +1,93 @@ | |||
import sqlparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the choice of namespace here is confusing - it's under the compiler
namespace, but it's used by the run
task? I recommend just moving this into run.py
for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmerrick I imagine we're going to want to do much more in the compilation step than we're currently doing. For instance, sort/dist keys, column encodings, etc can all be interpolated at compile time. I think it makes sense to leave the actual sql parsing code in a separate dir for now. I did change the name from compile
to build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like speculative design. We can always break it off into its own file later, if we decide we need to.
change print statements, function calls, imports
@cmerrick is consolidating the |
Cool, will revise asap Sent from my iPhone
|
def link(self, sql): | ||
sql = sql.strip() | ||
for statement in sqlparse.parse(sql): | ||
if statement.get_type().startswith('CREATE'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is expecting the model to be specified as create view as...
or create table as...
and then re-writes that create statement if necessary? I see that's a convenient way to allow the user to specify the name of the thing, but I think it will either break or produce surprising results if anything other than the most generic forms for create table/view as...
are used. For example, what's going to happen if someone writes create temporary table as ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code runs after the compilation step. Input to the system should be a SELECT statement. This is compiled to a CREATE statement, which is then run. I suppose this code is redundant, but it was originally intended to avoid running destructive SQL unintentionally
There is a single select statement per file, and the table/view name is inferred from the file name containing the SELECT statement
Sent from my iPhone
On Apr 1, 2016, at 6:29 PM, Christopher Merrick notifications@github.com wrote:
In dbt/task/run.py:
order = nx.topological_sort(self.graph, reverse=True)
for node in order:
if node in self.node_sql_map: # TODO :
yield (node, self.node_sql_map[node])
else:
pass
- def register(self, node, sql):
if node in self.node_sql_map:
raise RuntimeError("multiple declarations of node: {}".format(node))
self.node_sql_map[node] = sql
- def link(self, sql):
sql = sql.strip()
for statement in sqlparse.parse(sql):
so this is expecting the model to be specified as create view as... or create table as... and then re-writes that create statement if necessary? I see that's a convenient way to allow the user to specify the name of the thing, but I think it will either break or produce surprising results if anything other than the most generic forms for create table/view as... are used. For example, what's going to happen if someone writes create temporary table as ...?if statement.get_type().startswith('CREATE'):
—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, the simultaneous introduction of dependency-graphing and create-wrapping got me mixed up. That sounds right. In the future it would be good to provide that blurb when you create the PR, it's useful for understanding the approach.
back to you |
@cmerrick model configs are now be specified like this: models:
base:
enabled: true
materialized: false
pardot:
materialized: true
pardot_visitoractivity:
materialized: false The precedence goes By default, all models are enabled and are not materialized |
|
||
return model_config | ||
config = model_configs['base'].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that "base" is a magic word here, it means a model/model group can't be called that. Some alternative proposals:
- Don't allow this to be set hierarchically. Either only allow a global setting, or only allow per-model settings, but not both.
- Use regexps to specify the models to apply the settings to. In that case,
"base"
would become"*"
. - Use two different keys in the parent map - e.g.
"default_model_settings"
that is equivalent to"base"
, and"model_settings"
that is an array of per-model settings
What's a model group? Could there be a conflict between a model group name and the model name? |
@cmerrick we don't have a good name for this concept -- A model group is something like "pardot". The actual model is "pardot_visitoractivity", for instance. I think we threw around the name of "module" or "service", but I don't think either of those is descriptive. Name conflicts should be avoided for clarity (and it might break something else...?) but I think this config isn't affected. For example models:
pardot:
enabled: false
pardot:
enabled: true Should work fine. I think it's a moot point, because I can't foresee a situation in which the directory (model group) name is the the same as the model name. We should clarify by convention (or in code) that you shouldn't do this |
So a model group is essentially a namespace? Could you have more deeply nested namespaces e.g. |
@cmerrick yes indeed, a model group is a namespace. Right now, the namespace can't contain any subdirectories. I can foresee that being useful, but it's not implemented in a way that supports that currently |
approved |
add star macro
Walk the DAG inferred form SQL select statements. Run CREATE statements in the appropriate order.
I tested this with the
models
repo, but unit tests are definitely a good ideaAdditionally, models can be individually configured. The defaults are
These configs can be overridden in the
dbt_project.yml
file.This will result in all models outside of Pardot being created as views. Only the
pardot_visitoractivity
model will be created (as a table) within the Pardot directory.