Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Custom Diff Logic in Providers (WIP) #8769

Closed
apparentlymart opened this issue Sep 11, 2016 · 12 comments
Closed

Proposal: Custom Diff Logic in Providers (WIP) #8769

apparentlymart opened this issue Sep 11, 2016 · 12 comments

Comments

@apparentlymart
Copy link
Contributor

apparentlymart commented Sep 11, 2016

Currently the Diff step for all existing resources is implemented inside the helper/schema logic, focusing only on diffing the config against the state. There is currently no possibility for the provider's own code to do any of the following:

  • veto particular changes in the diff (e.g. if it knows a particular change is impossible)
  • customize the diff (e.g. if it knows that one change implies another that Terraform can't infer)
  • replace the diff behavior altogether (e.g. if the upstream service is diff/apply based rather than resource-action-based)

This idea is in the general theme of "Terraform should detect as much as possible during plan". Making Terraform more trustworthy and robust requires us to be able to detect problems and conflicts while producing the plan, rather than bailing out while we're halfway through applying.

Use-cases

Vetoing particular changes in the diff

Some AWS operations (not the RDS ones, notably) support a "dry run" flag which allows a client to verify that the given credentials have suitable access and the arguments are syntactically valid. If we gave resources veto power over their diffs, they could potentially do a dry run of certain operations to catch some problems during the plan phase.

Amazon RDS imposes some restrictions on what changes are possible to the allocated storage on database instances, and these restrictions depend on the storage engine in use.

For example, at the time of writing the size value for all of the storage engines must either remain constant or be at least 10% larger than the present value, except Microsoft SQL server which does not permit changes at all. If the aws_db_instance resource were to have "veto" power on the diff then it could check these constraints during plan.

(This RDS use-case could potentially be thought of as another example for the following section, if we were to decide that asking RDS to make the storage smaller should act as if ForceNew were set.)

Customizing the diff

Sometimes changing one attribute causes others to change. For example, updating an aws_s3_bucket_object when versioning is enabled will cause version_id to get updated.

Neither Terraform core nor helper/schema have enough information to infer this automatically. We have previously discussed a ComputedWhen attribute that can declare simplistic dependencies between attributes, but we can handle this more robustly across more cases by allowing a resource implementation to directly modify a diff produced by helper/schema, where it could add additional attribute diffs to represent changes it knows about due to its specific knowledge of a particular service.

Replacing the diff behavior altogether

Nomad has first-class support for diffing the existing state against a new configuration, and then safely applying that update without race conditions. Terraform does not yet have Nomad support, but once it does it would be awesome if Terraform could interact with Nomad's own plan/run actions to enable stronger guarantees that a plan will succeed. (As strong as Nomad can guarantee, which isn't 100% but is better than what Terraform can offer with its limited context on the Nomad cluster state.)

In this case, helper/schema would skip its own diffing logic altogether (or disregard what it created, at least) and instead call into the provider. The provider would call into the Nomad API's /job/{id}/plan endpoint to obtain the Diff and JobModifyIndex properties, which it can then translate into a Terraform-friendly diff.

When later asked to apply that diff, the provider's Update function can provide the JobModifyIndex argument when calling POST /v1/job/{id} to ensure that it will make exactly the change that was reflected in the plan, and thus it can ignore all of the other diff elements. (They're only included in the diff so they're shown in the plan output for the user's benefit.)

Potential Design

To support the above, helper/schema could support a new callback DiffFunc that takes the "default diff" produced by helper/schema and returns a new diff. The new diff might be either a mutation of the one given or an entirely fresh one, discarding what helper/schema came up with itself. DiffFunc may also return an error, effectively vetoing the diff.

This interface supports all of the above use-cases with a single function.

Rather than exposing the *terraform.Diff directly to this function, it's likely that we'd want some analog to schema.ResourceData that represents diffs in a first-class way, as opposed to burying them as an implementation detail as ResourceData does.

For example:

func resourceS3BucketObjectDiff(d *schema.ResourceDiff, meta interface{}) error {
    s3conn := meta.(*AWSClient).s3conn

    if d.HasChange("etag") {
        d.SetNewComputed("version_id")
    }

    if d.HasChange("bucket") {
        // Make sure the new bucket is valid
        // (this is tricky, since the user might have access to write to it but not to
        // read it... but this is just an example.)
        newBucket := d.Get("bucket").(string)
        _, err = s3conn.HeadBucket(&s3.HeadBucketInput{
            Bucket: aws.String(newBucket),
        })
        if err != nil {
            return fmt.Errorf("unable to verify bucket %q: %s", newBucket, err)
        }
    }

    return nil
}

Not sure yet exactly what schema.ResourceDiff would support, but I'd expect it to have helper functions for the steps needed for the use-cases above, like:

  • SetNewComputed(name), SetForcesNew(name) for simple tweaks
  • Clear() and SetDiff(name, oldValue, newValue) to discard the default diff and build an entirely new one based on an upstream API call
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Sep 25, 2016

Did a little prototyping today. Just wanted to capture some assorted notes for future-me before I quit working on this for now:

  • The resource-specific diff callback should only be able to alter diff values for Computed attributes, and only able to do the following write operations:
    • Create an attribute diff with a synthesized "new value", if the resource can predict what will ultimately get populated here. (This is the Nomad use-case, where the generated diff would be for the "last change index" and for the Terraform-ized description of the changes to task groups and tasks.)
    • Create an attribute diff with the "new value" marked as computed, if the resource knows that a value will change as a side-effect of the diff but isn't able to predict what it will change to. (This is the S3 version_id use-case)
  • One further write operation we can support is to mark a particular attribute diff for a non-computed attribute as "forces new resource", if the implementation is only able to do in-place updates for certain cases. (This accommodates the idea that growing a MySQL RDS instance >10% can be done in-place but any other size change must force a new resource.)
  • Specifically we must not allow the function to create or modify diffs for non-computed attributes aside from the "forces new" flag, since that would allow the diff to produce a new value other than the one given in the config, which would be confusing, almost always non-convergent and could possibly even make Terraform crash.
  • We need to be able to read both the old and new values of each attribute in order to be able to veto certain changes. The HasChange and GetChange methods on ResourceData seem to have a suitable interface for this, which we can copy.
  • Once we allow the "Diff" operation to start making calls to the backing API for pre-flight purposes, it can potentially notice that the world has changed since the last time we ran Read (i.e. the last refresh), but we can't update the state here because that would violate user expectations especially if -refresh=false is in use. Seems like all we can really do here is fail and ask the user to refresh before planning again? This feels kinda weak, but in practice isn't much different than detecting an inconsistency during Apply and failing there, which we already do in several places.
    • For example, for a hypothetical nomad_job resource the Read would presumably refresh the LastChangeIndex value, but the plan endpoint will also return a LastChangeIndex that may have drifted since we last refreshed, if there are other users simultaneously interacting with Nomad. My gut feeling is that it's better to detect and return an error during plan rather than to silently plow ahead with possibly-unpredictable results.
  • Since the diff-checking is focused on special exceptions on specific attributes, it's not valuable to call the function in the case where we're producing a Destroy or DestroyTainted diff... in these cases there are no attribute diffs at all, and so the diff-checking function would end up needing to handle this as a very different case -- at minimum, skipping all of its logic if either destroy flag is set -- and that would add unnecessary complexity to all implementations.
    • If it turns out that I'm wrong about this and we later find use-cases for vetoing destroy diffs, I'd still suggest making that a separate callback since it's very unlikely that there would be any overlapping logic between checking create/update diffs and checking destroy diffs.

@blalor
Copy link
Contributor

blalor commented May 10, 2017

This would be super helpful with Nomad. I am suspect of the current Nomad provider implementation in Terraform: treating the jobspec as a text blob makes for awful diffs. If this idea were implemented in such a way that the Nomad diff response were available, that would be a huge step forward.

@vancluever
Copy link
Contributor

This might be useful in the ACME provider for TOS updates on registrations. Initially, TOS are agreed to automatically, but there are provisions for updating the TOS which may need to be signalled via a diff and an auto_agree flag of some sort to ensure that the user consents to an updated TOS without explicit approval. In this instance, since the agreement attribute is computed on registration, the user can't reasonably know what this needs to be defined as in config, and hence any updates to the TOS would need to be automatically calculated for diff purposes in a different fashion than normal. What I would see happening is that no update to the agreed TOS would happen on refresh, just the current TOS, with the diff callback allowing something like SetNew("agreed_tos", Get("current_tos")) to trigger the diff without the need for the attribute to be in configuration.

@vancluever
Copy link
Contributor

vancluever commented May 21, 2017

Hey @apparentlymart have you started on this yet? I was thinking of taking a crack at it :) I did check and couldn't see anything yet in PR form or branches that mentioned this kind of work.

From my own analysis, I was thinking the following:

  • Using the in-flight diff that gets calculated in schema.Resource.Diff, along with a schema.ResourceData based off the information that gets used in said diff, as the basis for diff customization. This would be rolled into a back-ended (not embedded) schema.ResourceData that read operations during the customization function would be handed off to (functions such as Get, GetOk, HasChange and GetChange). We can also skip the customization on Destroy or DestroyTainted InstanceDiffs as defined above.
  • As to not mess with state during the diff customization, only allow the setting of new values and flags as per notes above:
  • SetNew(key string, value interface{}) error would get the current old value (maybe handing off to GetChange), and then by using the standard MapFieldWriter would validate/normalize the field supplied by value. The resulting diff would be filtered through finalizeDiff to ensure that the new diff is added correctly and as generally expected to the existing InstanceDiff.
  • SetNewComputed(key string) would function similarly to SetNew, but just simply with a zero value for the new value and flagging the diff as NewComputed.
  • SetDiff(key string, old, new interface{}) error to add or replace completely keys in the diff with the ability to customize the old value. Both values are filtered thru different MapFieldWirters to ensure that values are correctly normalized and validated. To set a computed value, new would just be blank here and SetNewComputed would be called after, or we can add a computed bool to the parameter list.
  • Clear(key string) error to clear a specific key from the diff, returning an error if that key didn't exist in the diff yet at all.
  • ClearAll() to just wipe the diff (not too sure if this is 100% necessary).
  • ForceNew(key string) error would only flag RequiresNew in the referenced key. For this to succeed, the diff would need to already exist, to prevent abuse (the expected behaviour would be that the provider would first check to see if a diff existed by using either HasChange or GetChange).
  • SetNew, SetNewComputed, SetDiff, Clear, and ClearAll would have guards that keep them from being used on fields that are not marked as Computed in the schema.

If anything, this was a nice way to teach myself how diffs really work :)

@apparentlymart
Copy link
Contributor Author

Hi @vancluever! Thanks for the deep thought here.

It's been a while since I looked at this, so I'm a bit rusty.

Back in Sep 2016 when I posted my last comment I'd written some hacky code to explore the problem, but it was never in any shape to be submitted so I ended up discarding it.

I think I remember convincing myself that we'd need a new thing that is like ResourceData but with a different API, since the set of operations for modifying a diff directly are different than what you do when modifying state, which is the primary goal of ResourceData.

With regard to your SetDiff method, I think we need to be careful here since I'd learned in my prototyping that Terraform behaves weirdly if the "Old" side of the diff doesn't exactly match what's in the state... things like d.HasChange can misbehave, etc. It might be okay if we also update the state at the same time, and pack that updated state into the plan, but that would be the first time the plan walk itself would change state. (Currently we do it only via the implied "refresh" walk.)

My recommendation would be to start by trying to reproduce my findings before via your own prototype. That'll help you see how all the machinery fits together here, and what the constraints are. Perhaps also starting with just one use-case would be good... I would suggest maybe starting with: the function can return an error to veto the diff (because that's easy), and it can also provide a placeholder value for a Computed attribute. I think the rest of the stuff here builds on that, so we can iterate once the basic idea is in place.

Let me know if you want to bounce any more ideas! 😀

@vancluever
Copy link
Contributor

vancluever commented May 23, 2017

Hey @apparentlymart!

Check out https://github.com/vancluever/terraform/commit/34c7c54eaaf4beb924515591e93b1e2d634ea450. I think it's a good start!

I'm not 100% comfortable with the amount of unexported data I needed to access, but we are in the same package after all. The benefit is a very minimal implementation that leans heavily on existing functionality, hopefully meaning that things will behave as expected when I add tests 😉

I missed the bits in the last paragraph where you mentioned vetoing the diff - but I think that the implementation of CustomizeDiffFunc satisfies that pretty easy - once I hook this into the diff logic (thinking of doing it in schema.go right after the main diff here), it should be pretty easy to kick back the error to cause TF to fail and block a plan from happening (assuming that is what you meant by a veto).

Let me know what you think!

@vancluever
Copy link
Contributor

vancluever commented May 23, 2017

Some more notes - I think now that I understand how field readers and writers work a lot more, I think what I might do is get rid of the "duplicated" ResourceData completely, and build my own reader/writers that will sit on a new diff level, more than likely.

This will allow for a number of things:

  • A safer interface. catalog won't be necessary any more so there will be no risk or panics there, and all read functions like Get, GetOk, HasChange, and GetChange will be guaranteed to return values consistent with what has currently been set.
  • Much less dependence on other struct's unexported data, which should be a lot safer in general. In addition to the mentioned stuff above, as a part of this too, the private schemaMap.diff that happens after every SetNew, SetNewComputed, SetDiff, and ForceNew will be removed and the full diff will just be properly re-built in schemaMap.Diff after the CustomizeDiffFunc has returned.

The only challenge in this that I just thought of now is how to signal to schemaMap.Diff that a diff has been wiped, now that the diff won't be re-processed in ResourceDiff. I think it will still be safe to have ClearAll completely reset the diff to an empty InstanceDiff, which would be naturally rebuilt with new data on the next diff run. This new design also I think makes Clear a lot easier to realize as well as there are less moving parts to take into account, or at the least everything is in a place that really makes sense.

@vancluever
Copy link
Contributor

I've moved on to tests now for ResourceDiff! Will push soon.

Things are going well, but one thing I'm seeing here that I'm wondering if it will be an issue for the implementation as conceptualized:

Removed diffs for computed values do not register. I'm guessing this is maybe because computed values were not thought to be diffable things? The main place this is coming up now is in a diff of a complex set. Ultimately what is happening is the diff is showing no old removed values at all, just a change of the count (in the event that these have changed) along with the new values. Not too sure of the implications that would have with HasChange or GetChange in ResourceData - maybe none, as state should be the source of truth with old values.

@vancluever
Copy link
Contributor

@apparentlymart - ResoruceDiff is pretty much done and all that's really left is to hook it into schemaMap's own diff behaviour. Check it out when you have a chance and I'll be adding that final piece in soon!

@vancluever
Copy link
Contributor

Got some basic tests for the wired in customization behaviour now, will be committing/pushing soon

Just thought I'd paste this little bit here :)

	testing.go:280: Step 0 error: Error planning: 1 error(s) occurred:
		
		* test_resource_with_custom_diff.foo: 1 error(s) occurred:
		
		* test_resource_with_custom_diff.foo: veto is true, diff vetoed

😁

@apparentlymart
Copy link
Contributor Author

apparentlymart commented Dec 4, 2017

A more evolved version of this was merged in #14887.

It will take some time to update existing resources to make use of this mechanism, but the helper/schema feature is now available to do it.

@ghost
Copy link

ghost commented Apr 5, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants