Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

beckjake · 2020-08-18T20:36:39Z

Describe the feature

dbt should hook into jinja's code generator and use it to figure out config() values without needing to evaluate jinja.

Currently we evaluate jinja models in parse mode to collect ref, source, and config calls. Those have the benefit of letting users do powerful things like control flow, etc, but come with the downside of requiring us to evaluate the jinja twice: once in "parse" mode (execute=False) and once in "compile" mode (execute=True).

As part of jinja's compilation, we can get access to the full jinja AST. One thing we could do is extract config() calls from the AST itself instead of having to go through the trouble of evaluating it with the context. The benefit here is you don't need to know the context to build the AST (you only need it to evaluate it). So that means we can build the AST, stop, and then investigate the parts of the AST we care about with the information that would be available in the parse context, and not need to actually evaluate jinja.

This is generally extensible to:

populating depends_on.macros in a way that doesn't care about control flow, so we can catch every macro dependency
populating depends_on.macros for macros themselves
var and env_var calls.
some level of static analysis

We'll probably have to come up with some restrictions. We can do this pretty simply as a first pass, and fall back to the existing behavior (figuring out config by evaluating the jinja) when we reach the boundaries of our ability to figure it out. As a first pass, I think we can pretty safely say that any/all of these are ok to punt on for this pass (optimization fences, sort of):

var and env_var calls: {{ config(materialized=var('materialization_type')) }}. We actually know these before we even get to parsing (right?).
calling macros in config: {{ config(materialized=my_macro()) }}
macros that call config: {% macro my_macro() %} {{ config(materialized='view') {% endmacro %}
using variables in config: {% set materialized_value = 'view' %} {{ config(materialized=materialized_value) }}
using ** in config arguments {{ config(**kwargs) }}
combinations/indirect calls: {% set materialization_macro = my_macro %}{{ config(materialized=my_macro()) }}

I haven't spent enough time poking around the AST to tell you for sure what's easy/what's hard, but I have ordered them in my rough guess at easiest to most difficult. It should be possible to figure out all of these, though some may be more complex than others. I think getting var/env_var support into this PR should be pretty easy (and seems the most obviously desirable).

I think in the longer term, as this behavior gets more useful, we'll end up wanting to restrict what can call ref, source, and config to just disallow anything we can't get through. That's totally out of scope for this issue, but it's worth calling out now!

It's important that we feel comfortable scrapping this if we develop the feature (or part of it) and discover we end up hitting the optimization fence everywhere and don't have any hope of figuring it out. I do feel vaguely like there's got to be some huge crippling complication I'm missing, but I don't really see anything obvious - we do have all the information we need at parse-time!

Describe alternatives you've considered

Just don't do this

Who will this benefit?

This is more of a longer-term benefit:

paves the way for statically knowing what changes should mark a node as "modified"
makes it possible to someday do single-pass jinja rendering

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-12-02T19:50:34Z

One thing we could do is extract config() calls from the AST itself instead of having to go through the trouble of evaluating it with the context. The benefit here is you don't need to know the context to build the AST (you only need it to evaluate it). So that means we can build the AST, stop, and then investigate the parts of the AST we care about with the information that would be available in the parse context, and not need to actually evaluate jinja.

@drewbanin @gshank There's a chance an approach like this could be a massive improvement over the way we parse and construct node.configs for schema tests today. Let's dedicate some time over the next two weeks to investigate.

gshank · 2020-12-03T16:58:02Z

This would definitely be a big step in the direction we need to go. In a simple project I use for testing, I see 'refs', 'depends_on', 'unrendered_config', and of course the corresponding entries in the child_map and parent_map as the pieces that are updated in the parse-time jinja rendering.

jtcohen6 · 2021-08-03T02:31:50Z

This fateful issue set us down some important paths. I'll always remember it fondly.

(closing in favor of #3680)

beckjake added enhancement New feature or request triage labels Aug 18, 2020

jtcohen6 removed the triage label Aug 18, 2020

jtcohen6 added this to the 0.19.0 milestone Aug 18, 2020

beckjake mentioned this issue Sep 8, 2020

Feature: include unrendered configs #2735

Merged

4 tasks

jtcohen6 added the state Stateful selection (state:modified, defer) label Sep 9, 2020

jtcohen6 removed this from the Kiyoshi Kuromiya milestone Sep 21, 2020

jtcohen6 added the performance label Dec 2, 2020

jtcohen6 mentioned this issue Dec 15, 2020

degraded performance in docker container due to slow node parsing #2948

Closed

5 tasks

jtcohen6 mentioned this issue Feb 2, 2021

Using non-deterministic code such as "invocation_id" in post_hooks causes "state:modified" to see changes #3047

Closed

jtcohen6 mentioned this issue Mar 31, 2021

Investigation: static analyzer for models #3215

Closed

jtcohen6 mentioned this issue Apr 20, 2021

Use depends_on.macros as input to state:modified #3278

Closed

jtcohen6 mentioned this issue Aug 3, 2021

Use static analyzer to extract unrendered_config from config() #3680

Closed

jtcohen6 closed this as completed Aug 3, 2021

jtcohen6 mentioned this issue Feb 13, 2024

[macro] [false positives] environment-aware logic in config should not cause resources to Always selected by state:modified #9564

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

beckjake commented Aug 18, 2020

jtcohen6 commented Dec 2, 2020

gshank commented Dec 3, 2020

jtcohen6 commented Aug 3, 2021

Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

Parse the dbt ast for config calls and evaluate them outside of the parsing context #2714

Comments

beckjake commented Aug 18, 2020

Describe the feature

Describe alternatives you've considered

Who will this benefit?

jtcohen6 commented Dec 2, 2020

gshank commented Dec 3, 2020

jtcohen6 commented Aug 3, 2021