Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash when populating NewExtra. #686

Merged
merged 3 commits into from
Jan 27, 2021
Merged

Conversation

paddycarver
Copy link
Contributor

When populating NewExtra, we iterate over attributes and inspect their
NewExtra property. In the case of null attributes, this results in a
crash, as we're dereferencing a nil pointer.

I'm still not clear on why this is happening, where it's coming from, or
what it means. The current understanding is it has something to do with
usage of CustomizeDiff. I'm not sure why yet. But this commit at least
surfaces which attribute was null, by logging it, and prevents the
crash, even if it doesn't resolve the problem.

See #548, hashicorp/terraform-provider-google#7934

When populating NewExtra, we iterate over attributes and inspect their
NewExtra property. In the case of null attributes, this results in a
crash, as we're dereferencing a nil pointer.

I'm still not clear on why this is happening, where it's coming from, or
what it means. The current understanding is it has something to do with
usage of CustomizeDiff. I'm not sure why yet. But this commit at least
surfaces which attribute was null, by logging it, and prevents the
crash, even if it doesn't resolve the problem.
Copy link
Member

@kmoe kmoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wise - my only concern is that by preventing the crash without understanding why it occurred, we might be allowing incorrect behaviour. Would it be better to log and still crash?

Approving anyway because based on hashicorp/terraform-provider-google#7934, the risk of incorrect behaviour seems small when compared to the effect of the crash on users right now.

Add logging that will catch the bug red-handed, hopefully, or at least
give us more details if our understanding of the bug is off. Also,
restore the crash behavior so we can see where it fits in with the logs.
Previously, we'd assign the result of finalizeDiff to the resource diff
without checking its return. This caused problems because a "finalized"
diff for any given attribute could, in fact, be no diff at all. Which we
represent as `nil`. But some consumers of the resource diff expect every
attribute in the map to be non-`nil`, and so crash on these attributes
that have diff entries but no diffs. See for example
hashicorp/terraform-provider-google#7934, which would crash when a
config had an explicit empty string as the value for a field that a
CustomizeDiff function set to ForceNew. Technically, there was a diff,
but finalizeDiff decided it wasn't a "real" diff, because the SDK still
interprets empty strings as "unset" for computed fields to align with
legacy behavior. But that meant a nil in the resource's map of attribute
diffs, which then was dereferenced when populating the response to
PlanResourceChange. This caused a crash.

This commit fixes that issue by updating all our usages of finalizeDiff
to check for a nil diff _before_ writing it to the resource's map of
attribute diffs. This was easier than tracking down all the usages of a
ResourceAttributeDiff and trying to ensure they were ignoring nil
values.
@paddycarver paddycarver requested a review from kmoe January 27, 2021 11:21
@paddycarver
Copy link
Contributor Author

@kmoe I successfully got to the bottom of the problem, and a customer that could reproduce it got me logs (see 99e4a70) that caught the bug red-handed and confirmed that helper/schema.Schema.finalizeDiff was returning a nil *terraform.ResourceAttrDiff. That was being put directly into the helper/schema.ResourceDiff, which our PlanResourceChange implementation didn't expect. The properties of the *terraform.ResourceAttrDiff were being accessed without checking if the *terraform.ResourceAttrDiff was nil, and that caused the crash.

I could've updated our usage of helper/schema.ResourceDiff to check for a nil *terraform.ResourceAttrDiff before using it, but given that helper/schema.ResourceDiff is exported, I felt it may be used in an unknown number of places, and so we would continue to play whack-a-mole in trying to squash every instance of this bug.

Instead, I opted to (in 570e128) update every usage of helper/schema.Schema.finalizeDiff, which can only be used in the helper/schema package, to check the return value before using it. If helper/schema.Schema.finalizeDiff returns nil, it just won't be included in the helper/schema.ResourceDiff, which fixes the bug for every usage of helper/schema.ResourceDiff. Or, at least, it should, unless something else is putting a nil value in there, in which case, I still think we may want to fix that at the source.

This took so long to come up in such an old and well-used resource because it only triggers when there's technically a diff and helper/schema.Schema.finalizeDiff decides that the diff is not a "real" diff and should be ignored. To make this true, the following situation needed to happen:

  1. The provider needed to use CustomizeDiff.
  2. The CustomizeDiff needed to update a field that was computed.
  3. The field needed to be a string
  4. The field needed to not be the child of another field that CustomizeDiff updated
  5. The field needed a value set in state.
  6. The field needed "" to be set explicitly in the config.

When those conditions are met, helper/schema.Schema.finalizeDiff interprets the "" as "unset" and cancels out the diff, returning a nil *terraform.ResourceAttrDiff. According to a code comment, this is to preserve the legacy behavior of interpreting "" as "unset", which for computed fields means "use what I have in state", meaning no diff.

I added a log line to explicitly call out when these conditions are met and we're squashing a diff for legacy reasons.

Mind taking another look?

Copy link
Member

@kmoe kmoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending test

@paddycarver paddycarver merged commit 1607a0a into master Jan 27, 2021
@paddycarver paddycarver deleted the paddy_newextra_crash branch January 27, 2021 18:07
@ghost
Copy link

ghost commented Feb 27, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants