Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with GCP Cloud SQL Instance disk_autoresize #14582

Merged
merged 3 commits into from
May 18, 2017

Conversation

catsby
Copy link
Contributor

@catsby catsby commented May 17, 2017

Hey @danawillow / @paddycarver !

I was looking into the following 3 failing tests as reported by our CI:

  • TestAccGoogleSqlDatabaseInstance_basic3
  • TestAccGoogleSqlDatabaseInstance_maintenance
  • TestAccGoogleSqlDatabaseInstance_slave

They’re all failing with the same error message:

testing.go:280: Step 0 error: Check failed: Check 2/4 error: Error settings.disk_autoresize mismatch, (true, false)

I started digging in and found that indeed the value returned from the API does not match our local value in the test check here. Looking in the resource, I noticed that we only seem to save the value from the API under certain conditions here:

	if v, ok := _settings["disk_autoresize"]; ok && v != nil {
		if v.(bool) {
			_settings["disk_autoresize"] = settings.StorageAutoResize
		}
	}

If I’m not mistaken, this block will only set the local disk_autoresize if that setting is found in our configuration first, and then only if that local value is true, correct? I believe this to be incorrect so I’ve patched it here. There’s also a similar issue in UPDATE where we only send a value if the new value is true, and I’ve fixed that here as well. In order to do this though, I have to add Default: false to the schema, so that we’re able to toggle from true to false (an old issue where Terraform and d.GetOk), or otherwise settings["disk_autoresize”] may not contain a value.

After fixing the read, I get the following error from the tests:

TF_ACC=1 go test ./builtin/providers/google -v -run=TestAccGoogleSqlDatabaseInstance_basic3 -timeout 120m
=== RUN   TestAccGoogleSqlDatabaseInstance_basic3
--- FAIL: TestAccGoogleSqlDatabaseInstance_basic3 (470.93s)
        testing.go:280: Step 0 error: After applying this step, the plan was not empty:

                DIFF:

                UPDATE: google_sql_database_instance.instance
                  settings.0.disk_autoresize: "true" => "false"

                STATE:

                google_sql_database_instance.instance:
                  ID = tf-lw-8275110025318672445
                  database_version = MYSQL_5_6
                  ip_address.# = 1
                  ip_address.0.ip_address = 35.184.210.221
                  ip_address.0.time_to_retire =
                  name = tf-lw-8275110025318672445
                  region = us-central
                  self_link = https://www.googleapis.com/sql/v1beta4/projects/hc-terraform-testing/instances/tf-lw-8275110025318672445
                  settings.# = 1
                  settings.0.activation_policy =
                  settings.0.authorized_gae_applications.# = 0
                  settings.0.backup_configuration.# = 0
                  settings.0.crash_safe_replication = false
                  settings.0.database_flags.# = 0
                  settings.0.disk_autoresize = true
                  settings.0.disk_size = 0
                  settings.0.disk_type =
                  settings.0.ip_configuration.# = 0
                  settings.0.location_preference.# = 0
                  settings.0.maintenance_window.# = 0
                  settings.0.pricing_plan =
                  settings.0.replication_type =
                  settings.0.tier = db-f1-micro
                  settings.0.version = 1
FAIL
exit status 1
FAIL    github.com/hashicorp/terraform/builtin/providers/google 470.951s
make: *** [testacc] Error 1

Which brings me to the API inconsistencies that maybe you could offer some insight into. The API docs here say that settings.storageAutoResize is default false:

However, due to the way our code was written, false was never being sent... indeed it can’t be sent right now without being in ForceSendFields, because the field has omitempty on it.

Even though settings.storageAutoResize is not included in the request, but future response and looking in the web console clearly shows that settings.storageAutoResize is true on the Server side. Can you poke someone on the API side about why this is ending up as true?

I see the SDK uses omitempty a lot. In order to address this, it looks like ForceSendFields is what I need here. I’ve included that as well.

I’m running the tests now, and may need an additional tweak after then run to ensure everything passes, but I wanted to open this now to get your feedback on the major points.

Let me know what you think!

@danawillow
Copy link
Contributor

Haven't looked at your code yet, but as far as I can tell, the default is changing to true but the documentation hasn't been updated yet.

Changing the default on our side to be true would be a breaking change in most circumstances, but in this case hasn't the breaking part of it already happened? If so, couldn't we just change the default to true on our side?

@paddycarver
Copy link
Contributor

Opened an issue with the Cloud SQL team to try and understand this new behaviour.

@catsby
Copy link
Contributor Author

catsby commented May 17, 2017

  1. Users who created an SQL Instance before the default changed should see no change with the default; it only affects people on create time.

  2. Adding our default to false would do nothing ATM; users expect false because it's documented that way

  3. Adding a default to true (and only this) would likely bring some unexpected diffs as that would trigger the current code to actually read the value and possibly display a difference. This would be especially bad with the code as is, because right now users cannot turn disk_autoresize off at all without the inclusion of ForceSendFields here

  4. Merging this PR would show diffs for people how have created a database since the API default change and they have:

    • omitted disk_autoresize
    • specified disk_autoresize = false
  5. Merging this PR should have no effect(?) on people who created a database before the API change

I don't see a way around # 4 there. The backend API changed and our code did not honor what became an implicit false here. We'll have to include a note in BACKWARDS INCOMPATIBILITIES / NOTES explaining they may see a diff and may need to restart the instance too if they fall in that special window time between the API change and the release of TF with this code. Assuming the API doesn't change back 😄

@catsby
Copy link
Contributor Author

catsby commented May 17, 2017

Ah, forgot one:

  1. Changing default to true here in this PR. Assuming that's the default going forward we'd need to document that, but it shouldn't have an effect on any one with an instance already created, I believe.

@catsby
Copy link
Contributor Author

catsby commented May 17, 2017

@danawillow / @paddycarver what do you think? I think the only question is regarding the default value, which the more I think the more I believe we should Default: true just to match the API. I don't think that will impact people, will it?

Either way, the READ and UPDATE method changes should go in, otherwise we can't toggle this feature or detect drift.

@catsby
Copy link
Contributor Author

catsby commented May 17, 2017

Here are the tests results with the code as is, and not the proposed Default: true:

TF_ACC=1 go test ./builtin/providers/google -v -run=TestAccGoogleSqlDatabaseInstance -timeout 120m
=== RUN   TestAccGoogleSqlDatabaseInstance_basic
--- PASS: TestAccGoogleSqlDatabaseInstance_basic (48.84s)
=== RUN   TestAccGoogleSqlDatabaseInstance_basic2
--- PASS: TestAccGoogleSqlDatabaseInstance_basic2 (47.90s)
=== RUN   TestAccGoogleSqlDatabaseInstance_basic3
--- PASS: TestAccGoogleSqlDatabaseInstance_basic3 (511.02s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_basic
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_basic (27.97s)
=== RUN   TestAccGoogleSqlDatabaseInstance_slave
--- PASS: TestAccGoogleSqlDatabaseInstance_slave (867.67s)
=== RUN   TestAccGoogleSqlDatabaseInstance_diskspecs
--- PASS: TestAccGoogleSqlDatabaseInstance_diskspecs (378.07s)
=== RUN   TestAccGoogleSqlDatabaseInstance_maintenance
--- PASS: TestAccGoogleSqlDatabaseInstance_maintenance (347.16s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_upgrade
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_upgrade (51.53s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_downgrade
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_downgrade (31.43s)
=== RUN   TestAccGoogleSqlDatabaseInstance_authNets
--- PASS: TestAccGoogleSqlDatabaseInstance_authNets (54.36s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/google 2365.966s

👍

@catsby catsby changed the title [WIP] Fix issue with GCP Cloud SQL Instance disk_autoresize Fix issue with GCP Cloud SQL Instance disk_autoresize May 17, 2017
@danawillow
Copy link
Contributor

I think your approach is right, and we definitely need this at the very least to allow people to set the value to false at all.

I'm leaning towards Default: true to match the API. I played around with it for a bit and it seems like if a field is unset in state then adding a default won't trigger an update, so this only would affect new instances going forward.

I found this line in the read function:
// Take care to only update attributes that the user has defined explicitly
Any idea why that might be?

@paddycarver
Copy link
Contributor

paddycarver commented May 18, 2017

I'll defer to you two on the question of Default. I'm nervous, because it matches the API implementation but not the API documentation, so the question in my mind is "was this an intentional change and the docs are out of date, or is this a bug and the docs are right"? If the former, Default: true is great. If not, we're going to have to do some work to fix it later. :(

[Update] I'm being told the change was intentional and believe it is not going to be reverted in the foreseeable future. Concern withdrawn. [/Update]

I found this line in the read function:
// Take care to only update attributes that the user has defined explicitly
Any idea why that might be?

The original author, IIRC, had a policy of not overwriting changes made outside Terraform, and so limited Terraform's authority to only the fields tracked in the config. We have moved away from this policy to match the rest of the Terraform providers, but the remnants of this haven't all been updated yet.

Copy link
Contributor

@paddycarver paddycarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @danawillow doesn't object, LGTM!

For sql_database_instance , to match the new API default.

Also adds diff suppression func for autoresize on 1st gen instances
@catsby
Copy link
Contributor Author

catsby commented May 18, 2017

@danawillow / @paddycarver I just pushed 0bf6ba1 , I'm rerunning the tests now.

  • changed default to true
  • added a DiffSuppressionFuc to only check the diff on this attribute if we're considering 2nd Gen+ instances; this attribute is invalid for 1st gen
  • Added MaxItems: 1 to the settings block and removed the len check. I'm sure we can use this other places too but I'm trying to keep this PR narrow scoped

Let me know what you think 👍

// Take care to only update attributes that the user has defined explicitly
Any idea why that might be?

I'm afraid I don't know why that is. It's counter to Terraforms declarative nature. I'm going to be poking around the GCP Provider and likely removing that behavior as I come across it. An all encompassing sweep of the entire Provider is not really feasible at this time, for me at least 😄

@catsby
Copy link
Contributor Author

catsby commented May 18, 2017

All green, everyone OK with this?

TF_ACC=1 go test ./builtin/providers/google -v -run=TestAccGoogleSqlDatabaseInstance -timeout 120m
=== RUN   TestAccGoogleSqlDatabaseInstance_basic
--- PASS: TestAccGoogleSqlDatabaseInstance_basic (49.15s)
=== RUN   TestAccGoogleSqlDatabaseInstance_basic2
--- PASS: TestAccGoogleSqlDatabaseInstance_basic2 (49.52s)
=== RUN   TestAccGoogleSqlDatabaseInstance_basic3
--- PASS: TestAccGoogleSqlDatabaseInstance_basic3 (348.40s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_basic
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_basic (29.48s)
=== RUN   TestAccGoogleSqlDatabaseInstance_slave
--- PASS: TestAccGoogleSqlDatabaseInstance_slave (826.61s)
=== RUN   TestAccGoogleSqlDatabaseInstance_diskspecs
--- PASS: TestAccGoogleSqlDatabaseInstance_diskspecs (408.46s)
=== RUN   TestAccGoogleSqlDatabaseInstance_maintenance
--- PASS: TestAccGoogleSqlDatabaseInstance_maintenance (3121.84s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_upgrade
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_upgrade (62.79s)
=== RUN   TestAccGoogleSqlDatabaseInstance_settings_downgrade
--- PASS: TestAccGoogleSqlDatabaseInstance_settings_downgrade (41.33s)
=== RUN   TestAccGoogleSqlDatabaseInstance_authNets
--- PASS: TestAccGoogleSqlDatabaseInstance_authNets (64.88s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/google 5002.473s

@paddycarver
Copy link
Contributor

Gonna trust @catsby's green test results, because I can't get past 500s right now, which I'm pretty sure is just flakiness in the upstream API. I'm still good with this being merged. @danawillow?

Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good aside from two very minor maybe-typos!

log.Printf("[ERR] error with regex in diff supression for data.autoresize: %s", err)
}
if !matched {
log.Printf("[DEBUG] suppressing diff on disk.autoresize due to 1st gen instance type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: disk_autoresize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

tier := settings["tier"].(string)
matched, err := regexp.MatchString("db*", tier)
if err != nil {
log.Printf("[ERR] error with regex in diff supression for data.autoresize: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should data.autoresize be disk_autoresize?

@catsby
Copy link
Contributor Author

catsby commented May 18, 2017

Fixed typos in 962ada8, pulling this in! Thanks all

@catsby catsby merged commit c3d2f6b into master May 18, 2017
@catsby catsby deleted the b-gcp-sql-autoresize branch May 18, 2017 20:09
@catsby
Copy link
Contributor Author

catsby commented May 18, 2017

Of course I forgot to update the docs here. I've done that in a8c2828

@ghost
Copy link

ghost commented Apr 12, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants