Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsequent applications with no changes cause state's serial to increment, causing stale plans #27827

Closed
andreykaipov opened this issue Feb 19, 2021 · 6 comments
Labels
bug explained a Terraform Core team member has described the root cause of this issue in code

Comments

@andreykaipov
Copy link

Terraform Version

Terraform v0.14.7

I'm running into some strange behaviour, and I'm not sure how I haven't ran into it before. It's best demonstrated through an example. Thankfully the reproduction is quite quick and can be done locally:

Steps to Reproduce

  1. In a new directory, create main.tf with the following contents and run terraform init:

    data "null_data_source" "test" {}
  2. Then run through the following:

    terraform plan -out tf.plan
    No changes. Infrastructure is up-to-date.terraform apply tf.plan
    Apply complete! Resources: 0 added, 0 changed, 0 destroyed.terraform apply tf.plan
    Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

    So far so good, but... do it again:

    terraform plan -out tf.plan
    No changes. Infrastructure is up-to-date.terraform apply tf.plan
    Apply complete! Resources: 0 added, 0 changed, 0 destroyed.terraform apply tf.plan
    Error: Saved plan is stale
    
    The given plan file can no longer be applied because the state was changed by
    another operation after the plan was created.

What's going on? Well, if we take a look at our state before and after our second plan, we'll find the following diff. Thankfully we're working with just local state, so it's easy to see:

❯ diff terraform.tfstate*
4c4
<   "serial": 2,
---
>   "serial": 1,
21c21
<             "random": "7251718409320187719"
---
>             "random": "214740342923409923"

What's happening is the null data source is generating a random number on every application and storing it within the state. Since it's a data source, there's no changes, and the state has changed so the state's serial is rightly incremented. However, subsequent applications with a plan that looked fine are now stale.

Expected behaviour

I expected for the plan to not be stale if there were no changes to report. While technically there are state changes, are they actionable changes? Perhaps the serial should only be incremented on changes to actual resources and not data sources?

Context

Originally I was ran into this issue with a data source from the GCP provider that stores an access token that changes on every application, causing the same behaviour seen above. I tested another module of ours and experienced the same issue with MongoDB's Atlas provider. So it doesn't seem uncommon to have data source attributes change?

@andreykaipov andreykaipov added bug new new issue not yet triaged labels Feb 19, 2021
@andreykaipov
Copy link
Author

Latest version where the above produces expected behaviour and does not error because of a stale plan is v0.11.14, which I suppose isn't too surprising as Terraform 0.12 was a big change! It errors as early as v0.12.0-alpha1.

Quickly looking through the code, it doesn't seem like the condition for when the serial is incremented is any different -- still just a direct comparison of the marshalled state, but it's definitely ignoring the data sources somehow.

@jbardin
Copy link
Member

jbardin commented Feb 19, 2021

Hi @andreykaipov,

Thanks for filing the issue. The expected behavior here is that a plan file should not be able to be applied multiple times if it changes the state in any way. If the state serial is incremented, it means there was something written to the state that cannot be re-written. The case of a plan being applied to an empty state is covered in issue #24078.

I confirmed with the development version of terraform, applying a truly empty plan can by done multiple times:

% tf apply plan
null_resource.a: Creating...
null_resource.a: Creation complete after 0s [id=947496681476652029]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

% tf plan -out plan
null_resource.a: Refreshing state... [id=947496681476652029]

No changes. Infrastructure is up-to-date.

That Terraform did not detect any differences between your configuration and the remote system(s). As a result, there are no actions to take.

% tf apply plan

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

% tf apply plan

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Your example however does change the state in the plan, because the data source is updated each time a plan is created. This isn't necessarily intrinsic in all data sources, but it is not uncommon either, and one may want to apply the updated value of the data source, as it should be equivalent to using the refresh command when there are no planned resource or output changes.

Since this is working as designed, I'm not sure what action we need to take here, can you explain more what workflow you have that was impacted by this?

@jbardin jbardin added explained a Terraform Core team member has described the root cause of this issue in code and removed new new issue not yet triaged labels Feb 19, 2021
@andreykaipov
Copy link
Author

andreykaipov commented Feb 20, 2021

Hi James - thanks for the quick response!

Since this is working as designed, I'm not sure what action we need to take here, can you explain more what workflow you have that was impacted by this?

I suppose when I first ran across this behaviour yesterday, it was just a bit strange to see a stale plan when the previous apply said nothing changed. Phrased differently, a truly idempotent plan and a plan whose data source silently changes both produce the following output:

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

I think that's a bit misleading if we're considering data sources too though. I guess it's more of a UX issue. Yeah the state changed, but when I see zero changes from the apply, I'm left super confused as a user why my plan is now stale all of a sudden. (I know now, but speaking for my past self from yesterday! 😄)


I think there's several ways to improve this:

  • Have Terraform print out the data sources in its final message:

    Apply complete!
    Resources: 0 added, 0 changed, 0 destroyed.
    Data sources: 0 added, 1 changed, 0 destroyed.
    

Maybe it's also worth asking why Terraform only shows resources as part of the plan diff? If Terraform generates data source attributes at plan-time and keeps them within the plan's tfstate, why shouldn't it also show the diff of those data sources? This leads to the alternative:

Or maybe at a minimum:

  • Terraform can show a diff of the stale plan with the remote state to make it easier to debug for the user? Maybe an entire diff isn't reasonable as that'd potentially leak sensitive info, but the resource addresses would be a good start. I found this question on the forum, but couldn't find a similar GH issue.

In any case, this really clarified a lot for me. Thanks again!

@apparentlymart
Copy link
Contributor

Hi @andreykaipov! Thanks for sharing that extra information.

The Resources: output here is something that's been with us for a long time (long before I started working on Terraform!) and so it's easy for us to take it for granted as folks who work on Terraform all the time, so I appreciate you taking the time to point out that it's confusing when seen without that context.

Another situation I could see being similarly confusing to this is if you apply a plan that only includes a change to root module output values. Again in that case the "Resources:" summary would read all zeroes because indeed no managed resources changed as part of the apply, but the state itself would still have changed.

Over the last few major releases we've been gradually making progress towards something like I proposed earlier in #15419, where Terraform would both allow applying changes that affect only the state and not resources (which you saw here) and also be more explicit during planning that it's going to do that. However, in the design work that led to the UI mock I posted over there I must admit I was focused only on how things appear during the plan phase, and didn't propose any additions to the apply progress output or post-apply summary.

Applying the same set of tradeoffs to the apply output as I did to the plan output for #15419, I'd consider having Terraform go even further than what you proposed and to list in the log output the individual objects that changed, perhaps like this:

Applying changes:

- aws_instance.server: Destroying (id = i-abc123)
- aws_instance.server: Successfully destroyed after 20s
- aws_instance.server: Creating...
    ami = "ami-abc123"
    (and other stuff omitted from this mock for brevity)
- data.null_data_source.test: Saved new result to the Terraform state.
- output.example: Saved new value to the Terraform state.

That way, everything previously mentioned in the plan is accounted for in the apply log.

(Incidentally: you can see in the mock for #15419 what storing the data resource results in the state is hopefully leading towards: I'd like to show a diff for data resources in the plan output too, because it can give some additional context for why a particular change is being proposed, for which you'd otherwise need to go and look at an external system to see. I can understand that it feels weird to store them today, because indeed Terraform doesn't actually make use of that data in any user-visible way.)

@andreykaipov
Copy link
Author

The Resources: output here is something that's been with us for a long time

You're right! That output line feels like such a staple of Terraform I admittedly felt a bit weird suggesting adding a new line for data sources!

The UI mock you linked in #15419 is really cool. Being able to see a summary of every change within the state would really improve the experience and help avoid surprises. Something like #24022 might also help as I imagine removals and changes of data sources would be handled similarly.

In any case, after rereading through the mentioned issues several times, I think I'm okay closing this one and subscribing to the others instead. I'll likely try mucking around with the state with jq and diff to see if I can find an easy-enough workaround to see a neat summary of my changes. Once again thank you for the quick replies @apparentlymart and @jbardin! I appreciate the team's insight! 😄

@ghost
Copy link

ghost commented Mar 23, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug explained a Terraform Core team member has described the root cause of this issue in code
Projects
None yet
Development

No branches or pull requests

3 participants