Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed, Optional, RequiresReplace behaviour: state of play #189

Closed
kmoe opened this issue Sep 30, 2021 · 7 comments
Closed

Computed, Optional, RequiresReplace behaviour: state of play #189

kmoe opened this issue Sep 30, 2021 · 7 comments
Labels

Comments

@kmoe
Copy link
Member

kmoe commented Sep 30, 2021

As of 4b382e2, terraform plan will output the following in its diff for an attribute with these combinations of properties:

Case Computed Optional RequiresReplace Old State Config Actual terraform plan output Desired terraform plan output
1 No No Yes "a" "a" No diff, no replacement No diff, no replacement
2 No No Yes "a" "b" a -> b # forces replacement a -> b # forces replacement
3 Yes No No "a" null a -> (known after apply) ?
4 Yes No Yes "a" null a -> (known after apply) # forces replacement ?
5 Yes Yes Yes "a" "b" a -> b # forces replacement a -> b # forces replacement
6 Yes Yes Yes "a" null a -> (known after apply) # forces replacement ?

Problematic cases are discussed below.

Case 3: Computed attribute, null in config

In the framework's initial releases, updating a Computed attribute would lead to a "Provider produced inconsistent result after apply" error (see #175), because the framework would return "a" from PlanResourceChange, only to set the value to something else during apply.

We changed this behaviour in #176 and instead returned Unknown from PlanResourceChange in such cases.

Any Unknown value in the plan is represented in the CLI output as "(known after apply)". This gives the user the impression (see #187) that the value is changing, even the provider does not actually change the value.

SDKv2 would return the old state value instead, so no diff would show in the CLI.

In this case, both the old and the new behaviours are potentially problematic, since the user may get a false impression that the value is changing, or a false impression that the value is changing.

Case 4: Computed attribute with RequiresReplace, null in config

As in case 3, the planned value of this attribute is set to Unknown, leading to "(known after apply)" being printed in the CLI.

In addition, including this attribute's path in the RequiresReplace field of the PlanResourceChange response triggers Core logic which checks whether the old state value is equal to the provider's planned state. If they are not equal, it marks the resource as requiring replacement and prints # forces replacement in the CLI. Since Unknown does not equal "a", the resource is marked as requiring replacement.

This leads to problems such as #187. The resource will be destroyed and recreated even though the value of the Computed attribute would not have changed.

Again, it is unclear what the desired CLI output is here, since we cannot assume that the provider either will or will not change the value of the Computed attribute.

Conclusion

The fundamental problem these cases demonstrate is that it is not possible to know the value of a Computed attribute at plan time: it is therefore not possible to know whether a resource whose attribute is marked RequiresReplace needs to be destroyed and recreated.

@kmoe kmoe added the thinking label Sep 30, 2021
@bflad
Copy link
Contributor

bflad commented Sep 30, 2021

To enumerate additional cases:

Case Computed Optional RequiresReplace Old State Config Actual terraform plan output Desired terraform plan output
7 Yes Yes No null "a" + a + a
8 Yes Yes Yes null "a" ? + a # forces replacement
9 Yes Yes No "a" "a" No diff, no replacement No diff, no replacement
10 Yes Yes Yes "a" "a" No diff, no replacement No diff, no replacement
11 Yes Yes No "a" "b" a -> b a -> b
12 Yes Yes No "a" null a -> (known after apply) No diff, no replacement
13 No Yes No null "a" + a + a
14 No Yes Yes null "a" + a # forces replacement + a # forces replacement
15 No Yes No "a" "a" No diff, no replacement No diff, no replacement
16 No Yes Yes "a" "a" No diff, no replacement No diff, no replacement
17 No Yes No "a" "b" a -> b a -> b
18 No Yes Yes "a" "b" a -> b # forces replacement a -> b # forces replacement
19 No Yes No "a" null - a - a
20 No Yes Yes "a" null - a # forces replacement - a # forces replacement

I don't think any of these highlight any additional problematic cases (except 12 which must be handled by providers if they do not want to show that difference), but in terms of completely representing our expectations, it might be good to note (or test) all of them.

@apparentlymart
Copy link

apparentlymart commented Sep 30, 2021

Thanks for enumerating all of these permutations for easy scanning! Puts the whole thing into perspective.

My sense here though is that the question isn't really answerable, because of the reason you identified: the framework itself can't possibly know what the provider is going to do during the apply phase. "Unknown" is a coarse way for the framework to be honest about that: "the provider hasn't told me what it intends to do, so I don't know" is technically correct (the best kind of correct?! 🙃), but also pretty disruptive in cases where the provider isn't being explicit about what it intends to do.

For me then, this seems to come down to a robustness vs. usability tradeoff:

  • On the one hand, we can decide that we expect the provider to be totally explicit about its intentions, directly writing down in the planning function exactly how all of the computed attributes will change in response to the changes to non-computed arguments.

    This conforms to our usual goal of being conservative and giving feedback about risk earlier (during plan) rather than later (during apply), but requires a lot of additional code in the provider compared to the old SDK and can cause anything from uncertainty to outright blocking the user from moving forward if the provider logic does miss something and thus PlanResourceChange makes an overly-conservative prediction.

  • On the other hand, we could conclude (based on our prior experience with providers written against the old SDK) that the most common situation is that computed attributes typically don't change during an update, and so it's helpful to provider developers to make that the default and then they only have to write out logic for the exceptions.

    This means less monotonous copying inside provider code, but it also means that the consequences for omissions move to the apply phase instead of the plan phase, when Terraform detects that the final result of apply didn't match the plan after all.

It seems pretty hard to develop a sense for how often provider logic was failing to update computed attributes in the plan before, because we retained the legacy SDK opt-out and so those problems appear only as warnings rather than as explicit errors under the old SDK. I was initially tempted to say that obviously the second option above was better, but I reconsidered after realizing that the legacy SDK opt-out may be giving me a misleading read on the situation.

Do you think there's some reasonable way we could survey existing practical providers and see how commonly the existing logic falls foul of the post-apply consistency safety check? That seems like it would help test if "most of the time computed attributes don't change during update" is really true, or if it's just an illusion due to that problem generally turning up only downstream today. If computed attributes often tend to change during an update, then this seems more like necessary complexity to me and thus I'd lean more towards the first option above (which I believe matches the current framework behavior) in the interests of getting quicker feedback when the necessary logic is missing, so we can address it more proactively.

Mostly just thinking aloud here... not proposing anything in particular.

@apparentlymart
Copy link

apparentlymart commented Sep 30, 2021

(I suppose there are some possible compromises between those two, such as defaulting to the "robust" option but having a per-attribute declarative way to opt it in to the "usability" option, which I suppose is approximately what #180 was about.)

@paddycarver
Copy link
Contributor

I think an important wrinkle to understanding this problem and its impacts is to complicate our understanding of what "computed" means here.

It's true that it includes relatively-rarely updated computed-only fields, but it's also true that (I believe) more frequently used patterns are impacted. For example, in any list of nested attributes (in SDKv2, would've been a list with an Elem of schema.Resource) that has any computed attributes, replacing any item in the list triggers this, because the computed attributes change. So, for example, if you have a list of disks and then change one of the disks or the disk gets recreated and gets, e.g., a new ID, you get a plan-didn't-match-apply error unless you explicitly allow that scenario.

My expectation is that, in light of that information, the situation is more widespread than we'd anticipate in SDKv2 and provider developers are more likely to run into it. I think my concern here about the possibility of requiring provider developers to explicitly say it's going to happen is that the learning curve for provider development when working with computed attributes then becomes much steeper: we need to introduce the plan modification concept much earlier than we would otherwise need to.

@paddycarver
Copy link
Contributor

Do you think there's some reasonable way we could survey existing practical providers and see how commonly the existing logic falls foul of the post-apply consistency safety check? That seems like it would help test if "most of the time computed attributes don't change during update" is really true, or if it's just an illusion due to that problem generally turning up only downstream today.

I believe this is something we can directionally approximate, and am poking at some leads there we can talk about privately to see if we can work with a less theoretical understanding here.

@paddycarver
Copy link
Contributor

I believe between #213 and #204 this has been resolved.

@github-actions
Copy link

github-actions bot commented Dec 4, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants