Evaluate data sources in plan when necessary #24904

jbardin · 2020-05-08T17:26:31Z

In order to support depends_on in for modules, it first needs to be supported on data sources that may be contained within modules. In order to do that, we need to be able to evaluate data sources during plan.

This does not provide a way to fully plan data source changes, as they may still only be read during refresh when possible. While this is a step towards implementing #17034, it does not yet fully merge the refresh and plan walks. The new handling of depends_on here will still be relevant for that change, and the refactoring of the eval nodes will make the code easier to update.

We start by removing the existing depends_on half-measures, and make it so that data sources can detect if they really have a depends_on dependency that effects their update. GraphNodeAttachDependsOn give us a method for adding all transitive resource dependencies found through depends_on references, so that data sources can determine if they can be read during plan. This will be done by inspecting the changes of all dependency resources, and delaying read until apply if any changes are planned.

The next step is to evaluate data sources during plan. The existing logic in the EvalReadData monolith wasn't maintainable, so that gets broken up into separate eval nodes for each phase, since the desired logic is subtly different in each case.

What we end up with is the following logic for data sources:

Refresh

If the config is not wholly known, or there are any depends_on references and the data source has never been read, it is defered to planning. The state status is set to ObjectPlanned, and it is removed from the state forcing a refresh during plan or apply.
Otherwise if the config is known, the data source is refreshed as usual and saved to state.

Plan

If the config is not wholly known, or there are changes in any transitive depends_on resources, we generate a change with plans.Read, and defer reading until apply. The proposed change, which will contain unknown values, is stored in change.After.
If the state value is nil (which may indicate it being deferred from refresh), or the config causes a change in the prior value, we read the data source and plan the change. The change is stored as a plans.Read to indicate that it only needs to be updated in state during apply.

Apply

Apply the stated plans.Read change for the data source. If the planned After value is not wholly-known, the config is re-evaluated and the data source is read into state. If the After value is wholly-known, it indicates the data source was read during plan, and the value is saved directly to state without reading the data source again.

Fixes #11806
Fixes #17173

GraphNodeAttachDependsOn give us a method for adding all transitive resource dependencies found through depends_on references, so that data source can determine if they can be read during plan. This will be done by inspecting the changes of all dependency resources, and delaying read until apply if any changes are planned.

Adding a transformer to attach any transitive DependsOn references to data sources during plan. Refactored the ReferenceMap from the ReferenceTransformer so it can be reused for both.

In order to find any changes related to a particular configuration address, we need a new method to get changes to all possible instances.

codecov · 2020-05-08T22:54:43Z

Codecov Report

Merging #24904 into master will increase coverage by 0.08%.
The diff coverage is 76.10%.

Impacted Files	Coverage Δ
plans/changes.go	`0.00% <0.00%> (ø)`
plans/changes_sync.go	`0.00% <0.00%> (ø)`
terraform/node_resource_apply_instance.go	`81.14% <ø> (-0.16%)`	⬇️
terraform/eval_read_data_apply.go	`61.22% <61.22%> (ø)`
terraform/node_resource_abstract.go	`86.18% <76.92%> (+0.66%)`	⬆️
terraform/eval_read_data.go	`62.67% <78.65%> (+12.02%)`	⬆️
terraform/eval_read_data_plan.go	`81.70% <81.70%> (ø)`
terraform/transform_reference.go	`87.43% <83.63%> (-3.25%)`	⬇️
terraform/eval_state.go	`59.63% <100.00%> (+0.75%)`	⬆️
terraform/graph_builder_plan.go	`100.00% <100.00%> (ø)`
... and 13 more

apparentlymart

Thanks for working on this! I can tell it was a tricky thing to get working, given all of the moving parts here.

I left some thoughts and questions inline. There's a lot to take in here so I'm sorry if I missed some things as I tried to load the relevant context back into my head to understand the implications.

plans/changes.go

plans/changes_sync.go

apparentlymart · 2020-05-12T22:51:47Z

terraform/context_plan_test.go

@@ -4992,8 +4993,10 @@ func TestContext2Plan_createBeforeDestroy_depends_datasource(t *testing.T) {
 		}
 	}
 	p.ReadDataSourceFn = func(req providers.ReadDataSourceRequest) providers.ReadDataSourceResponse {
+		cfg := req.Config.AsValueMap()


This now being called during planning makes me wonder what implication this has for the -refresh=false option. Would we now expect that it isn't possible to create a plan without reading data resources, and that -refresh=false is now only for disabling drift detection?

At first glance that feels okay to me -- being able to disable data resource reads was only really a side-effect of it being done during refresh, not an explicit requirement -- but it is something I expect will feel like a breaking change in some edge-cases, so I think we'd need to be call it out explicitly in the upgrade guide if so.

Yes, we will need to call this out. I'm sure we'll find someone why relies on using stale data sources, but since we want to eventually move this completely into planning, it's something we need to deal with.

Having had some time away to reflect on this a bit more, I'm feeling like this half-way position where we read data resources during refresh and during plan is likely to be confusing, because it will still be subject to the usual quirks/bugs where data sources get resolved against the "wrong" values (the ones in the prior state, rather than the ones in the config) but it will also no longer be disableable with -refresh=false and so the user model here is kinda odd.

I realize I'm taking a different posture here than my initial reaction to this changeset, but out of curiosity: do you have a sense of the size of the delta from this PR to having data sources not update during refresh at all? I was previously feeling nervous about squeezing that in so late before 0.13, but the effect of this change feels potentially heavy too and so now I'm wondering if we shouldn't just go all-in on the new behavior and get some more benefit to justify the risk. What do you think? 🤔

(I'm also now considering that this would be released at the same time as an initial iteration of #15419, which wasn't true when we originally discussed this, and so that capability could help mitigate the fact that terraform refresh would no longer work to get data sources updated because folks could use terraform apply with no other changes to get there instead.)

The benefit with this PR is that while refresh can be using "old" values for data source config, plan will check again if the data source needs to be read, and update the data source there if needed. While overall it is still not optimal, it should reduce the confusing instances to only those where either the evaluation fails entirely (which is the same situation we have now requiring -refresh=false), or when providers require a change in the data source via a dependency (which again we have today).

Overall there shouldn't be much more work to remove data source reevaluation from a separate refresh cycle, but that's not what we need in the end. The bulk of the remaining work is removing the separate refresh cycle entirely, which requires the merging of the refresh and plan walks for all resources at once, which did not look trivial (I only briefly ventured down that path). I also toyed with both not having data sources refresh, and having data sources refresh from state, but that always fails with the cases around using data sources for provider configuration, especially when the data source is involved in authentication.

Hmm indeed... the abuse of data sources for issuing credentials is definitely a wrench in the works here. Reading data sources twice during a plan is not great for that situation either, because it will presumably then issue two sets of credentials, though that's better than it failing altogether.

This particular data source misuse is going to be problematic for #15419 too (terraform plan will never succeed without generating a new plan to be applied) so I expect we're going to need to address it somehow either way... 😖

The good news is that the plan-time read is conditional, so it only gets issued twice if the configuration has visible changes (or if the data source is doing something incorrectly, which also isn't out of the question with the relaxed legacy SDK constraints).

terraform/eval_read_data.go

terraform/eval_read_data_apply.go

terraform/node_resource_destroy.go

apparentlymart · 2020-05-12T23:23:42Z

terraform/transform_reference.go

+// "perpetual diff"
+type GraphNodeAttachDependsOn interface {
+	GraphNodeConfigResource
+	AttachDependsOn([]addrs.ConfigResource)


Given that "depends on" already has a meaning in our language, might we name this something like AttachResourceDependencies instead, to make it a bit clearer that it's getting something derived from depends_on rather than getting depends_on itself?

We already have AttachDependencies for actual dependencies though, which is why I ended up on this sort of awkward name. I do want this to be clear that it is only derived from depends_on in some way, and not the result of normal dependencies with visible changes, but am open to a better name.

Since this is specifically about resource dependencies rather than arbitrary dependencies, I'd suggest AttachResourceDependencies as a subtle (admittedly) way to differentiate this.

apparentlymart · 2020-05-12T23:27:06Z

terraform/transform_reference.go

+
+		// Only data need to attach depends_on, so they can determine if they
+		// are eligible to be read during plan.
+		if selfAddr.Resource.Mode != addrs.DataResourceMode {


Would it hurt to just do this unconditionally for every implementer of GraphNodeAttachDependsOn? It feels a little confusing to me that this is named as if it's a generic mechanism but then has this suprising restriction hidden in it, and my intuition from reading the rest of this PR is that other resource modes would just accept it and ignore it. 🤔

No, it wouldn't hurt. I did try to document in each location that these mechanisms only apply to data source, so I didn't want to attach them to managed resources and have something start using them from another code path (an occurrence which is unfortunately all too common)

Fair enough! Perhaps we could ram that point home in a crude way by naming the interface GraphNodeDataResourceAttachDependencies, leaving no doubt at all that someone should think twice before using it for something other than a data resource, even if they don't read the comments.

(For similar reasons, I might also call this transformer AttachDataResourceDependenciesTransformer, so it's easy to see scanning the graph builder instantiation that this is something specifically for data resources.)

This transformer is what will provider the data sources with the transitive dependencies needed to determine if they can read during plan or must be deferred.

We need to load the state during refresh, so that even if the data source can't be read due to `depends_on`, the state can be saved back again to prevent it from being lost altogether. This is a step towards having data sources refresh like resources, which will be from their saved state value.

In order to udpate data sources correctly when their configuration changes, they need to be evaluated during plan. Since the plan working state isn't saved, store any data source reads as plan changes to be applied later. This is currently abusing the Update plan action to indicate that the data source was read and needs to be applied to state. We can possibly add a Store action for data sources if this approach works out. The Read action still indicates that the data source was deferred to the Apply phase. We also fully handle any data source depends_on changes. Now that all the transitive resource dependencies are known at the time of evaluation, we can check the plan to determine if there are any changes in the dependencies and selectively defer reading the data source.

Start fixing plan tests that don't expect data sources to be in the plan. A few were just checking that Read was never called, and some expected the data source to be nil.

The data source was never read, so the schema was never verified.

A few test with had data sources that were never read before, and needed to get valid responses for the tests.

EvalReadDataApply was all dead code, as it was only using during delete and simply set the state to nil.

Remove extra fields, remove the depends_on logic from NodePlannableResourceInstnace, and start breaking up the massive Eval method.

The logic for refresh, plan and apply are all subtly different, so rather than trying to manage that complex flow through a giant 300 line method, break it up somewhat into 3 different types that can share the types and a few helpers.

Rather than re-read the data source during every plan cycle, apply the config to the prior state, and skip reading if there is no change. Remove the TODOs, as we're going to accept that data-only changes will still not be plan-able for the time being. Fix the null data source test resource, as it had no computed fields at all, even the id.

The state was not being set, so the change was not evaluated correctly for dependant resources. Remove use of cty.NilVal in readDataSource, only one place was using it, so the code could just be moved out. Fix a bunch of places where warnings would be lost.

Ensure that a data source with depends_on not only plans to update during refresh, but evaluates correctly in the plan ensuring dependencies are planned accordingly.

The new data source planning logic no longer needs a separate action, and the apply status can be determined from whether the After value is complete or not.

apparentlymart

This looks good to me!

I still have a little uncertainty about what including this in our 0.13 release will look like, but I think that uncertainty is better handled as merging this good implementation and then taking some time to experiment with this in conjunction with the forthcoming change to the way terraform plan will handle refresh-time changes and see how those things feel together. I expect that if anything feels a little "off" then it'd end up just being a relatively minor tweak to either one of our changes, rather than a drastic change in direction.

ghost · 2020-06-20T01:50:17Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jbardin added 3 commits May 8, 2020 12:25

add AttachDependsOnTransformer

f8907b9

Adding a transformer to attach any transitive DependsOn references to data sources during plan. Refactored the ReferenceMap from the ReferenceTransformer so it can be reused for both.

GetConfigResourceChanges from plans

18ca98a

In order to find any changes related to a particular configuration address, we need a new method to get changes to all possible instances.

jbardin requested a review from a team May 8, 2020 23:05

jbardin force-pushed the jbardin/plan-data-sources branch from 0eb1f12 to 22d8d94 Compare May 8, 2020 23:14

jbardin marked this pull request as ready for review May 8, 2020 23:16

jbardin force-pushed the jbardin/plan-data-sources branch 3 times, most recently from e557160 to 48e7b96 Compare May 9, 2020 01:10

apparentlymart reviewed May 12, 2020

View reviewed changes

jbardin added 17 commits May 13, 2020 13:58

add AttachDependsOnTransformer to plan

23e259a

This transformer is what will provider the data sources with the transitive dependencies needed to determine if they can read during plan or must be deferred.

Fix some plan tests with planned data

9241629

Start fixing plan tests that don't expect data sources to be in the plan. A few were just checking that Read was never called, and some expected the data source to be nil.

missing id in schema in plan test

047c9b3

The data source was never read, so the schema was never verified.

add data responses where they were missing

cf2dd43

A few test with had data sources that were never read before, and needed to get valid responses for the tests.

remove EvalReadDataApply

be20a79

EvalReadDataApply was all dead code, as it was only using during delete and simply set the state to nil.

start to refactor EvalReadData

4a92b78

Remove extra fields, remove the depends_on logic from NodePlannableResourceInstnace, and start breaking up the massive Eval method.

cleanup refresh test

96be76e

refactor EvalReadData

6ca252f

The logic for refresh, plan and apply are all subtly different, so rather than trying to manage that complex flow through a giant 300 line method, break it up somewhat into 3 different types that can share the types and a few helpers.

add test for using a data source with depends_on

c6c851e

Ensure that a data source with depends_on not only plans to update during refresh, but evaluates correctly in the plan ensuring dependencies are planned accordingly.

rename methods for ConfigResource changes

8e3728a

un-export new data eval nodes

7b8f138

add evalWriteEmptyState for data source removal

8850d78

Don't use plans.Update for data sources

291110f

The new data source planning logic no longer needs a separate action, and the apply status can be determined from whether the After value is complete or not.

jbardin force-pushed the jbardin/plan-data-sources branch from 0d93351 to 291110f Compare May 13, 2020 17:59

jbardin requested a review from apparentlymart May 13, 2020 18:08

attach dependency type name changes

a8e0914

apparentlymart approved these changes May 19, 2020

View reviewed changes

apparentlymart mentioned this pull request May 19, 2020

lang/funcs: add template function #24978

Open

jbardin merged commit e690fa1 into master May 20, 2020

jbardin deleted the jbardin/plan-data-sources branch May 20, 2020 14:00

jbardin mentioned this pull request May 20, 2020

Module depends_on #25005

Merged

danieldreier added this to the v0.13.0 milestone May 21, 2020

jbardin mentioned this pull request May 22, 2020

Computed module output is not forcing new on dependent resource #25015

Closed

ghost locked and limited conversation to collaborators Jun 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate data sources in plan when necessary #24904

Evaluate data sources in plan when necessary #24904

jbardin commented May 8, 2020 •

edited

Loading

codecov bot commented May 8, 2020 •

edited

Loading

apparentlymart left a comment

apparentlymart May 12, 2020

jbardin May 13, 2020

apparentlymart May 19, 2020

jbardin May 19, 2020

apparentlymart May 19, 2020

jbardin May 19, 2020 •

edited

Loading

apparentlymart May 12, 2020

jbardin May 13, 2020 •

edited

Loading

apparentlymart May 18, 2020

apparentlymart May 12, 2020

jbardin May 13, 2020

apparentlymart May 18, 2020

apparentlymart left a comment

ghost commented Jun 20, 2020

Evaluate data sources in plan when necessary #24904

Evaluate data sources in plan when necessary #24904

Conversation

jbardin commented May 8, 2020 • edited Loading

Refresh

Plan

Apply

codecov bot commented May 8, 2020 • edited Loading

Codecov Report

apparentlymart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbardin May 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbardin May 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apparentlymart left a comment

Choose a reason for hiding this comment

ghost commented Jun 20, 2020

jbardin commented May 8, 2020 •

edited

Loading

codecov bot commented May 8, 2020 •

edited

Loading

jbardin May 19, 2020 •

edited

Loading

jbardin May 13, 2020 •

edited

Loading