[vars] [false negatives] Changes in used variables do not cause models to be selected by `state:modified` #4304

CyberShadow · 2021-11-18T11:59:03Z

Hi there,

In https://docs.getdbt.com/reference/node-selection/state-comparison-caveats, it is stated:

Vars

If a model uses a var or env_var in its definition, dbt is unable today to identify that lineage in such a way that it can include the model in state:modified because the var or env_var value has changed.

I was wondering if there is an issue tracking this limitation; I searched for one and couldn't find one, so I hope it's OK that I opened this issue so that our team can follow its status. (If there was one and I missed it, apologies!)

Also, I apologize for the naive question, but would it not be easier, at least for the case where the desired effect is to re-run all changed models and their children, to check if the source code after preprocessing (Jinja macro expansion) has changed, instead of before? I would expect that this would automatically cause changes in variables to propagate to changes to the compiled code of the models, which would then indicate a need to re-run them.

Thanks!

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-11-18T13:27:24Z

@CyberShadow Thanks for opening the issue! I think the previous issue we had to track this (#2704) was closed, on account of being mostly (though not entirely) resolved.

We just recently did a bunch of work to capture env_var calls for partial parsing. That allowed us to track (a) when those env vars change, and (b) which files need to be re-parsed accordingly. We can reasonably extend that to support --vars as well, which currently trigger a full re-parse.

There's a lot of overlap between that work, and what we'd need to support this in state:modified. There's still a gap between them, however. Bridging that gap would require us to do one of two things:

Add just enough of this information to manifest.json, to allow it to make the same comparisons—namely, the dictionary of env var values, and a mapping from each env var to every node that depends on it in some way (including macros)
Switch dbt state selection to use the internal manifest (partial_parse.msgpack) rather than the documented, externally contracted manifest (manifest.json)

Also, I apologize for the naive question, but would it not be easier, at least for the case where the desired effect is to re-run all changed models and their children, to check if the source code after preprocessing (Jinja macro expansion) has changed, instead of before?

This is a totally fair question! We want state:modified to detect, as much as possible, changes that are accountable to you, the dbt developer. Those changes could occur because someone has edited a model, because someone has edited a macro that a model depends on, or because someone has changed the configuration for dozens of models via dbt_project.yml. As much as possible, we want to limit "false positives" — differences that occur not due to editing dbt project source code, but because a project is being run in a different environment, or because something in the database has changed, impacting dynamically templated SQL.

To that end, I'm starting to think that changes in --vars + env vars look less like the kind of development-time changes we want to be capturing in state:modified, and more like the kind of environment-based changes we want to ignore for state-based selection (but observe well enough to re-parse the project correctly, of course). That is, I'm starting to wonder if state:modified.vars and state:modified.env_vars are actually desirable!

So, to turn the question around: Could you say more about your use case? What sorts of changes are you hoping to test in CI, or capture in production, that depend on vars or env vars?

CyberShadow · 2021-11-18T13:53:43Z

Hey, thank you for the detailed response!

Good to know about state:modified.vars and that it is on the roadmap :)

Thank you for the insight about state:modified. The explanation makes sense, though I would mention that occasional false positives (e.g. on a dbt upgrade) would be OK for us as it simply means that some models will be wastefully re-run. On the other hand, false negatives are potentially dangerous if some models were not rebuilt when they really should have, as it would cause the final results to be wrong.

I'm happy to talk about our use case :) One part of the topic I'm working on right now is reproducibility of our data transformation pipelines, i.e., the ability to re-run previous pipelines and obtain the exact same result. This will make it easier to reproduce past problems in order to understand and debug them. As part of this, the approach I was considering was that each time we receive updated data, it would be assigned a version identifier, with that version of the data being immutable henceforth. This data version identifier to use would then be passed to dbt as a var, which would then be expanded to the fully-qualified name of some table or snapshot in the SQL template.

You can probably see where this is going: if the specified data version to use is changed, then all models that use that data source should also be re-run. So, from this point of view, it would make sense if a change in a variable's value causes models which use it to be considered modified; but, I also understand your perspective where vars indicate environmental configuration, changes in which should not signal modifications.

Hope this helps!

LHarris55 · 2022-03-16T10:33:58Z

Hi there,

We are experiencing the exact same issue where once a variable is changed in the dbt_project.yml file the downstream models which reference the variable are not ran when calling state:modifed. Hence, the variables not updated in these models.

It sounds like state:modified.vars would be a great solution to this.

Hope this can be implemented in the future!

axelborja · 2022-09-29T09:21:15Z

We are also experiencing the same limitations, so, unfortunately, because we are using some vars in our models, we cannot use the state:modified feature at all. (or we can but it's a non-sense)

@jtcohen6, any update about this feature?

Pgillets · 2022-11-25T20:50:54Z

Im my case, we use the DBT grant feature to manage access in BigQuery roll level security, and every time we include a new Service Account or user to the list all models are set to modified.
So, a PR that only includes an e-mail to the dbt_project.yml make all the tables in the project to run on the Slim CI.

axelborja · 2022-12-29T10:39:17Z

We are also experiencing the same limitations, so, unfortunately, because we are using some vars in our models, we cannot use the state:modified feature at all. (or we can but it's a non-sense)

@jtcohen6, any update about this feature?

@jtcohen6 any update about this one? :)

jtcohen6 · 2023-03-07T15:19:44Z

I've come around from some of the hesitations I expressed above (back in Nov 2021). I do think this would be desirable, and we can make it configurable for end users based on a subselector.

The prerequisite for this work is capturing var calls during parse time, and storing them in something like depends_on. I think it's the same prerequisite work for:

[CT-1556] Smarter handling of --vars in partial parsing #6323

I'm going to mark this issue a paper_cut. Please keep upvoting / commenting to help us inform priority!

github-actions · 2024-08-13T01:54:02Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2024-08-21T01:52:45Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

ChenyuLInx · 2024-10-03T23:11:41Z

This change is reverted in #10813 due to it would cause new parsing error in project that would work before. So reopening the issue.
We also noticed some memory issue goes away after this PR is reverted, not sure it is related but worth checking memory consumption of the fix before shipping again. (Internal Thread)

jtcohen6 added triage enhancement New feature or request state Stateful selection (state:modified, defer) Team: Language labels Nov 18, 2021

jtcohen6 added discussion Team: Execution and removed triage Team: Language enhancement New feature or request labels Nov 18, 2021

jtcohen6 removed the discussion label Apr 19, 2022

jtcohen6 added paper_cut A small change that impacts lots of users in their day-to-day Team:Language and removed Team:Execution labels Mar 7, 2023

jtcohen6 removed the Team:Language label Jul 19, 2023

graciegoheen added the state: modified label Feb 7, 2024

This was referenced Feb 13, 2024

[Epic] state:modified should Actually (only) select the modified resources #9562

Closed

[env_vars] [false positives] environment-aware logic for sources should not cause sources to always be marked as modified #9573

Closed

graciegoheen changed the title ~~Changes in used variables do not cause models to be selected by state:modified~~ [false negatives] Changes in used variables do not cause models to be selected by state:modified Feb 14, 2024

graciegoheen changed the title ~~[false negatives] Changes in used variables do not cause models to be selected by state:modified~~ [vars] [false negatives] Changes in used variables do not cause models to be selected by state:modified Feb 14, 2024

github-actions bot added the stale Issues that have gone stale label Aug 13, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024

MichelleArk reopened this Sep 25, 2024

github-actions bot removed the stale Issues that have gone stale label Sep 26, 2024

MichelleArk mentioned this issue Sep 30, 2024

state:modified vars, behind "state_modified_compare_vars" behaviour flag #10793

Merged

5 tasks

MichelleArk closed this as completed in #10793 Sep 30, 2024

dbeatty10 mentioned this issue Oct 1, 2024

[Core] state_modified_compare_vars behavior flag dbt-labs/docs.getdbt.com#6202

Open

ChenyuLInx reopened this Oct 3, 2024

graciegoheen added this to the v1.10 milestone Nov 18, 2024

dbeatty10 mentioned this issue Nov 18, 2024

New state:modified.vars selection method dbt-labs/docs.getdbt.com#6222

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vars] [false negatives] Changes in used variables do not cause models to be selected by `state:modified` #4304

[vars] [false negatives] Changes in used variables do not cause models to be selected by `state:modified` #4304

CyberShadow commented Nov 18, 2021

Vars

jtcohen6 commented Nov 18, 2021

CyberShadow commented Nov 18, 2021

LHarris55 commented Mar 16, 2022

axelborja commented Sep 29, 2022 •

edited

Loading

Pgillets commented Nov 25, 2022

axelborja commented Dec 29, 2022

jtcohen6 commented Mar 7, 2023

github-actions bot commented Aug 13, 2024

github-actions bot commented Aug 21, 2024

ChenyuLInx commented Oct 3, 2024

[vars] [false negatives] Changes in used variables do not cause models to be selected by state:modified #4304

[vars] [false negatives] Changes in used variables do not cause models to be selected by state:modified #4304

Comments

CyberShadow commented Nov 18, 2021

Vars

jtcohen6 commented Nov 18, 2021

CyberShadow commented Nov 18, 2021

LHarris55 commented Mar 16, 2022

axelborja commented Sep 29, 2022 • edited Loading

Pgillets commented Nov 25, 2022

axelborja commented Dec 29, 2022

jtcohen6 commented Mar 7, 2023

github-actions bot commented Aug 13, 2024

github-actions bot commented Aug 21, 2024

ChenyuLInx commented Oct 3, 2024

[vars] [false negatives] Changes in used variables do not cause models to be selected by `state:modified` #4304

[vars] [false negatives] Changes in used variables do not cause models to be selected by `state:modified` #4304

axelborja commented Sep 29, 2022 •

edited

Loading