Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

Closed
beckjake opened this issue Aug 18, 2020 · 3 comments
Labels
enhancement New feature or request performance state Stateful selection (state:modified, defer)

Comments

@beckjake
Copy link
Contributor

Describe the feature

dbt should hook into jinja's code generator and use it to figure out config() values without needing to evaluate jinja.

Currently we evaluate jinja models in parse mode to collect ref, source, and config calls. Those have the benefit of letting users do powerful things like control flow, etc, but come with the downside of requiring us to evaluate the jinja twice: once in "parse" mode (execute=False) and once in "compile" mode (execute=True).

As part of jinja's compilation, we can get access to the full jinja AST. One thing we could do is extract config() calls from the AST itself instead of having to go through the trouble of evaluating it with the context. The benefit here is you don't need to know the context to build the AST (you only need it to evaluate it). So that means we can build the AST, stop, and then investigate the parts of the AST we care about with the information that would be available in the parse context, and not need to actually evaluate jinja.

This is generally extensible to:

  • populating depends_on.macros in a way that doesn't care about control flow, so we can catch every macro dependency
  • populating depends_on.macros for macros themselves
  • var and env_var calls.
  • some level of static analysis

We'll probably have to come up with some restrictions. We can do this pretty simply as a first pass, and fall back to the existing behavior (figuring out config by evaluating the jinja) when we reach the boundaries of our ability to figure it out. As a first pass, I think we can pretty safely say that any/all of these are ok to punt on for this pass (optimization fences, sort of):

  • var and env_var calls: {{ config(materialized=var('materialization_type')) }}. We actually know these before we even get to parsing (right?).
  • calling macros in config: {{ config(materialized=my_macro()) }}
  • macros that call config: {% macro my_macro() %} {{ config(materialized='view') {% endmacro %}
  • using variables in config: {% set materialized_value = 'view' %} {{ config(materialized=materialized_value) }}
  • using ** in config arguments {{ config(**kwargs) }}
  • combinations/indirect calls: {% set materialization_macro = my_macro %}{{ config(materialized=my_macro()) }}

I haven't spent enough time poking around the AST to tell you for sure what's easy/what's hard, but I have ordered them in my rough guess at easiest to most difficult. It should be possible to figure out all of these, though some may be more complex than others. I think getting var/env_var support into this PR should be pretty easy (and seems the most obviously desirable).

I think in the longer term, as this behavior gets more useful, we'll end up wanting to restrict what can call ref, source, and config to just disallow anything we can't get through. That's totally out of scope for this issue, but it's worth calling out now!

It's important that we feel comfortable scrapping this if we develop the feature (or part of it) and discover we end up hitting the optimization fence everywhere and don't have any hope of figuring it out. I do feel vaguely like there's got to be some huge crippling complication I'm missing, but I don't really see anything obvious - we do have all the information we need at parse-time!

Describe alternatives you've considered

Just don't do this

Who will this benefit?

This is more of a longer-term benefit:

  • paves the way for statically knowing what changes should mark a node as "modified"
  • makes it possible to someday do single-pass jinja rendering
@beckjake beckjake added enhancement New feature or request triage labels Aug 18, 2020
@jtcohen6 jtcohen6 removed the triage label Aug 18, 2020
@jtcohen6 jtcohen6 added this to the 0.19.0 milestone Aug 18, 2020
@jtcohen6 jtcohen6 added the state Stateful selection (state:modified, defer) label Sep 9, 2020
@jtcohen6 jtcohen6 removed this from the Kiyoshi Kuromiya milestone Sep 21, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 2, 2020

One thing we could do is extract config() calls from the AST itself instead of having to go through the trouble of evaluating it with the context. The benefit here is you don't need to know the context to build the AST (you only need it to evaluate it). So that means we can build the AST, stop, and then investigate the parts of the AST we care about with the information that would be available in the parse context, and not need to actually evaluate jinja.

@drewbanin @gshank There's a chance an approach like this could be a massive improvement over the way we parse and construct node.configs for schema tests today. Let's dedicate some time over the next two weeks to investigate.

@gshank
Copy link
Contributor

gshank commented Dec 3, 2020

This would definitely be a big step in the direction we need to go. In a simple project I use for testing, I see 'refs', 'depends_on', 'unrendered_config', and of course the corresponding entries in the child_map and parent_map as the pieces that are updated in the parse-time jinja rendering.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 3, 2021

This fateful issue set us down some important paths. I'll always remember it fondly.

(closing in favor of #3680)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

3 participants