-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helper/schema: Schema.DiffSuppressOnRefresh #882
Conversation
This lint rule about not naming a variable |
I have actually experienced this issue (#801) myself, with the same kind of resource: AWS policies. So, a way for the provider to deal with this would be lovely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from me asking questions that I then self-answered (and please, let me know if I'm missing something or some nuance), I think this looks good (thank you for the long description - that context helped me a lot).
I guess the only thing to deal with are the linter issues, but they should be very easy to address.
continue // no schema? weird, but not our responsibility to handle | ||
} | ||
schema := schemaList[len(schemaList)-1] | ||
if !schema.DiffSuppressOnRefresh || schema.DiffSuppressFunc == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why isn't the branching if DiffSuppressOnRefresh
handled here instead of, say, at the beginning of this method OR before this method is even called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I think I can answer my own question: this is to ensure this logic is applied selectively to sub-attributes of a schema. Even if most of it doesn't use it, if there is even just 1 attribute that does, we want to check it for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unfortunately I think we do need to visit everything here to make sure none of them are in need of this work.
An additional resource-level opt-in could potentially mitigate that but it seems like an annoying and unprecedented extra step, and in practice most (though certainly not all) resource types have only tens of attributes and so I don't expect this to be prohibitively expensive in the common case.
b01d95c
to
98e4260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for the delayed review on my part since my attention has been elsewhere the last few days.
In general, I think this is likely the best approach to take here given the current compatibility promises we need to make, while offering the most flexibility. Creating a separate ResourceData
"Set with DiffSuppressFunc" method or "Set with Options" method and having this as an available option is likely a non-starter given the level of effort and potential provider developer confusion.
Most of my comments here are more around making the functionality more debuggable and understandable for provider developers, who may not be familiar with certain terminology that the SDK has hidden away.
We should likely setup a separate terraform-provider-corner case that verifies this new functionality in the unlikely event there are missing nuances in the SDK logic that would prevent it from working with real world providers. Worst case, it could be used just to ensure we maintain backwards compatibility should we need to change this area of functionality in the future. That can be done outside this pull request though, since it involves a separate repository and a dance of merges to properly become a usable integration test.
helper/schema/resource.go
Outdated
@@ -638,6 +638,7 @@ func (r *Resource) RefreshWithoutUpgrade( | |||
state = nil | |||
} | |||
|
|||
schemaMap(r.Schema).handleDiffSuppressOnRefresh(s, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: It may be beneficial to offer this as a ResourceData
method on top of the schema field implementation, so provider developers can opt into setting this once per resource (or potentially across the provider, if we did some other implementation work). Not saying anything different needs to be done here now, unless we think that is something we want to consider upfront in terms of implementation and doing it outside of schemaMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did originally consider making this a whole-resource-type-level settable, but decided to go more surgical so that it would be less risky to enable for a long-standing existing resource type implementation.
I'm not sure if I follow exactly what you mean by a ResourceData
method here, but perhaps my comment on the other question above addresses why it's considerably harder to get a good result for this at the ResourceData
/FieldWriter
layer and why I ultimately elected to make it a postprocessing step. If you're thinking of something different than what I described there then I'm happy to consider another approach!
98e4260
to
b6d348f
Compare
(I'm hoisting this design note out of one of the resolved review conversation threads from above, so it's easier to see if we look back on this later to understand why it's implemented this way.) I originally intended to make There were two big challenges with doing this on
While doing it after the fact against the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this, @apparentlymart, looks good to me from a code perspective (just one small note about the terraform-plugin-log usage). Super excited that we can get a opt-in solution in place for this framework. 🚀 If you could also add a changelog entry, that would be great.
e.g. Adding a .changelog/882.txt
file with something like:
```release-note:note
helper/schema: The `Schema` type `DiffSuppressOnRefresh` field can be enabled to check `DiffSuppressFunc` for normalization changes during refresh, similar to planning. This can prevent unintended "Values changed outside of Terraform" messages. For many attributes, this is likely desirable and expected behavior when implementing `DiffSuppressFunc`, however it is opt-in for backwards compatibility reasons.
```
```release-note:enhancement
helper/schema: Added the `DiffSuppressOnRefresh` field to the `Schema` type
```
@@ -985,6 +1024,7 @@ func (m schemaMap) diff( | |||
continue | |||
} | |||
|
|||
log.Printf("[DEBUG] ignoring change of %q due to DiffSuppressFunc", attrK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for adding this.
helper/schema/schema.go
Outdated
} | ||
|
||
if schema.DiffSuppressFunc(k, oldV, newV, d) { | ||
tfsdklog.Debug(ctx, "ignoring change of %q due to DiffSuppressFunc", k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quirk of the current terraform-plugin-log implementation and its logging functions such as tfsdklog.Debug()
is that the final variadic argument is expected as key-value pairs rather than operating like the f
suffix of other string format functions. There is nothing to catch an errant unpaired key implementation at compile-time. 😖 We'll either need to fmt.Sprintf()
the value into the message parameter or switch it to a structured key-value pair:
tfsdklog.Debug(ctx, fmt.Sprintf("ignoring change of %q due to DiffSuppressFunc", k))
// or
tfsdklog.Debug(ctx, "ignoring change due to DiffSuppressFunc", "tf_attribute", k)
I'll take this as real world feedback to create an issue in terraform-plugin-log to enforce this at compile time (e.g. convert those key-value expectations into a ...map[string]interface{}
function parameter instead of the variadic ...interface{}
) or otherwise create static analysis tooling to warn of this type of issue. I think we'd shy away from a Debugf()
variant as it would discourage the structured key-value pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, okay! Sorry... should've looked more closely for other examples before jumping in here.
I think I'm going to follow your first example here just because it'll then match with the other log
-based one I added in the Diff
function, and because it doesn't seem super valuable to be able to filter by this key and thus structured logging is not super important here.
Unfortunately in the original design of DiffSuppressFunc we made it work only during planning, and so the normalization vs. drift classification it does isn't honored in any other situation. Unilaterally enabling it for both refresh and apply would be too risky at this late stage where so many existing provider behaviors rely on these subtle details, but here we introduce a more surgical _opt-in_ mechanism whereby specific attributes can be set to also check the Refresh result against DiffSuppressFunc, and so avoid reporting any normalization done by the remote system as if it were drift. This behavior will be primarily useful for strings containing complex data serialized into some particular microsyntax which allows expressing the same information multiple ways. In particular, existing situations where we classify whitespace changes and other such immaterial changes in JSON values as normalization vs. drift would be a good candidate to enable this flag, so that we can get the same results during refreshing and thus avoid churning any downstream resources that refer to the unexpectedly-updated result.
b6d348f
to
84f804f
Compare
Oh whoops 🤦♂️ I just noticed that I didn't actually push up my updated commit after I addressed the previous round of feedback. Sorry about that! I think I've now belatedly addressed everything we discussed so far. 🤞 |
For what it's worth, I have made a small attempt to add a What I've attempted is to add some normalization to the domain part of the email address in the fake "user" object, to observe that normalization being misclassified as drift and then cascading downstream during refresh. I can see it fail before I add the I've run out of time to spend on that for now, so I'm afraid I won't be able to contribute a corner-provider test for this, for now at least. 😖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me and going to pull this in. Thanks so much, @apparentlymart 🚀
Aside: A design goal of our next generation of provider-level acceptance testing would be to offer multiple layers of abstraction during testing. Theoretically, this would include low level functionality for choosing individual Terraform CLI commands to execute and inspecting any available Terraform CLI data, if necessary. Once that's available, we can convert terraform-provider-corner over to it and hopefully be able to test many more of these types of scenarios that the current acceptance testing framework cannot handle due to its solely high level abstractions.
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
Unfortunately in the original design of
DiffSuppressFunc
we made it work only during planning, and so the normalization vs. drift classification it does isn't honored in any other situation.Unilaterally enabling it for both refresh and apply would be too risky at this late stage where so many existing provider behaviors rely on these subtle details, but here we introduce a more surgical opt-in mechanism whereby specific attributes can be set to also check the Refresh result against
DiffSuppressFunc
, and so avoid reporting any normalization done by the remote system as if it were drift.This behavior will be primarily useful for strings containing complex data serialized into some particular microsyntax which allows expressing the same information multiple ways. In particular, existing situations where we classify whitespace changes and other such immaterial changes in JSON values as normalization vs. drift would be a good candidate to enable this flag, so that we can get the same results during refreshing and thus avoid churning any downstream resources that refer to the
unexpectedly-updated result.
This deals with only the most visible problem previously discussed over in hashicorp/terraform-plugin-framework#70 -- creating value churn during the refresh phase -- because the risk of a more systematic change to the legacy SDK at this point is too high. Hopefully eventually the new framework will address this more completely (also handling it during the apply step, for example), but I intend this PR as a pragmatic way to help provider developers more easily resolve bug reports related to incorrect classification of normalization by relying on the
DiffSuppressFunc
implementations already written against this old SDK, so they can be addressed with only minimal schema changes and in particular without switching to the new framework.My primary goal here is that this should cause absolutely no observable difference in result for any existing schema that doesn't set this new flag, and so the presence of this new code would have no effect on any existing provider unless they also opt in a particular attribute to it. I tried by best to code defensively to avoid e.g. introducing new panics in weird edge cases, but this legacy SDK does have a fair number of weird edge cases and so if you can spot one I didn't think of please let me know!
Although not exactly what was requested there, I think this closes #801 by providing a way to explicitly opt in to the requested behavior, even though it can't be the default behavior.