Clarify state requirements when ApplyResourceChange returns errors. #27751

paddycarver · 2021-02-11T21:08:55Z

According to the Resource Instance Lifecycle Change document:

The new state object returned from the provider must meet the following
constraints:

Any attribute that had a known value in the planned new state must have an
identical value in the new state.

Any attribute that had an unknown value in the planned new state must take
on a known value of the expected type in the new state. No unknown values
are allowed in the new state.

This description doesn't really mention anything about what's required in the error case, however. In #20295, Terraform was updated to continue processing the returned state as though there were no error, with the explanation that this resolves "plan doesn't match apply" errors. It does, however, leave the state in an incorrect configuration. See hashicorp/terraform-plugin-sdk#476.

This is mostly mitigated by refresh, which restores the state to the proper values, but not every field in state can be updated in refresh, leaving some fields with a confused view of the world.

My question here is: does/should the protocol allow values from either prior or planned states (or any value of the expected type if unknown in the planned state) if an error diagnostic is returned as part of ApplyResourceChange? This would allow the provider to set the values that have been updated, but leave the values unchanged that haven't. This technically violates the plan, but the plan wasn't properly executed, anyways.

I'm unclear if this is the intended behavior or not, given the context of #20295 and the specter of "plan doesn't match apply" errors.

apparentlymart · 2021-02-11T21:52:55Z

I believe that our intent here was that if the provider returns any new state object in response to ApplyResourceChange then it would be validated and saved in exactly the same way regardless of whether there are errors, and the presence of errors only influences whether Terraform aborts at that point or whether it continues to walk the graph.

The aim here is to walk the fine line between saving the most accurate representation of the current state we have, which is the result the provider returned, but also halting before we try to take any other actions downstream because the assumptions we made during planning no longer hold and thus we need to create a fresh plan based on what actually happened.

(and, as an extension of that, if the validation of the returned object itself causes errors — like if the final data doesn't agree with what was planned — those should hopefully have the same effect as errors the provider itself reported: the object still gets saved, but the walk still aborts, for the same reasons.)

Whether that's actually what it's doing in practice I can't say with 100% certainty, because as you've pointed out we ended up fiddling with the behavior a few times in the meantime to deal with a few different edge-cases that cropped up with the old SDK behaviors that are themselves not entirely self-consistent. In particular, I remember we had some specific ambiguity about whether the SDK itself would return a valid new value for all of the various different error cases it might encounter; I'm fuzzy on the details but I recall us finding that for some errors the SDK would return a new object but for others it would not.

The LegacyTypeSystem field in responses also creates an extra dimension of possible behavior here that the Resource Instance Lifecycle Change document doesn't attempt to describe; those behaviors are entirely defined by implementation rather than specification, and so can't say with any certainty at all how error handling might behave in different scenarios in that special mode.

jbardin · 2021-02-12T18:52:52Z

The assertion that the returned value matches the plan is only made if there are no errors. If there are errors, the returned state value is still saved, but execution is halted preventing the interpolation of the returned state into other resources. It is up to the provider (the SDK in the most common case) to determine what the current "correct" value is and return that to terraform, which the provider has more knowledge about than terraform itself. The Partial feature was how this concept was implemented in the legacy SDK, which is mapped best it can be to the new API.

Since this seems to match what you've described, would the following be adequate as an additional bullet under ApplyResourceChange?

* In the case of an error, the provider is expected to return the state it
  determines most accurately reflects the current resource state, which may
  contain any combination of values from the prior or planned states.

apparentlymart · 2021-02-12T19:39:30Z

Since we're talking through all the different edge cases here, another one that came to my mind that seems worth noting (although hopefully doesn't arise with a correctly-implemented SDK): if the "new state" object from the provider doesn't match the schema of the resource type then that's treated in a sense like a syntax error rather than a semantic error: Terraform can't physically save it in that case, because the state file encoding/decoding relies on the schema. I believe in that case we'll retain the old value that was in the state, and discard the syntactically-invalid one returned by the provider.

I don't think this intersects explicitly with the original line of questioning here, but while we're loading the context back into our brains I thought it was good to tease out any other specifics that the current documentation might not be capturing.

jbardin · 2022-06-22T18:19:54Z

The resource lifecycle document has been re-written, along with a new planning behaviors doc.

github-actions · 2022-07-23T02:35:29Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

paddycarver added documentation new new issue not yet triaged labels Feb 11, 2021

jbardin added waiting-response An issue/pull request is waiting for a response from the community and removed new new issue not yet triaged labels Mar 10, 2021

bflad added the providers/protocol Potentially affecting the Providers Protocol and SDKs label Dec 1, 2021

jbardin closed this as completed Jun 22, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify state requirements when ApplyResourceChange returns errors. #27751

Clarify state requirements when ApplyResourceChange returns errors. #27751

paddycarver commented Feb 11, 2021

apparentlymart commented Feb 11, 2021 •

edited

Loading

jbardin commented Feb 12, 2021

apparentlymart commented Feb 12, 2021

jbardin commented Jun 22, 2022

github-actions bot commented Jul 23, 2022

Clarify state requirements when ApplyResourceChange returns errors. #27751

Clarify state requirements when ApplyResourceChange returns errors. #27751

Comments

paddycarver commented Feb 11, 2021

apparentlymart commented Feb 11, 2021 • edited Loading

jbardin commented Feb 12, 2021

apparentlymart commented Feb 12, 2021

jbardin commented Jun 22, 2022

github-actions bot commented Jul 23, 2022

apparentlymart commented Feb 11, 2021 •

edited

Loading