Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/google: use a mutex to prevent concurrent sql instance operations #14424

Merged
merged 5 commits into from
May 31, 2017

Conversation

danawillow
Copy link
Contributor

Fixes #9018.

A few implementation questions:

  • Is resource_sql_database_instance_test the best test file for these changes? It's not about instance creation, it's about other stuff creation, but it's also not user or database specific so instance felt more general.
  • Is locking on the instance name sufficient or should we have some sort of prefix to make sure we don't accidentally conflict with other resources? I would do the instance URL but we don't have it at that point and it seems silly to do a read just to get the URL so we can lock on it (and I also don't want to reconstruct the URL). We can get the instance URL from the operation, but at that point it's too late, I think.
TF_ACC=1 go test ./builtin/providers/google -v -run=TestAccGoogleSqlDatabaseInstance_multipleOperations -timeout 120m
=== RUN   TestAccGoogleSqlDatabaseInstance_multipleOperations
--- PASS: TestAccGoogleSqlDatabaseInstance_multipleOperations (51.86s)
PASS
ok  	github.com/hashicorp/terraform/builtin/providers/google	51.985s

@ctavan
Copy link
Contributor

ctavan commented May 15, 2017

Thanks a lot for attacking this issue!

While I don't feel 100% competent to comment on this I believe that in my case a mutex on the instance name would not be the ideal solution. While the default for google_sql_database_instance is to auto-generate a random instance name where using the instance name for the mutex should be fine, I have chosen instance names like main/default/master in practice and I have used the same name for many other resources as well since many of my google resources are grouped within terraform modules where I don't see a need for more specific naming (not sure if other google resources use the plain name for mutexes though).

I think prefixing would already improve the situation a lot even though a slightly uneasy feeling remains for me since there is still the hypothetical case where names could collide… I admit that it's probably not a problem in practice though.

@danawillow
Copy link
Contributor Author

That makes sense! I added a prefix, and included the project name in the key as well just in case the same database name is being used in multiple projects with the same tf config.

@ctavan
Copy link
Contributor

ctavan commented May 17, 2017

Great, that should be reasonably unique! Now another resource in the same project would have to choose that exact prefix to produce a clash. I still wonder if we should leave a small comment on the instanceMutexKey that we are aware that the way we determine the mutex key is not 100% safe but that we reason that in practice it should be very close to 100% safe… Should someone ever run into the issue that might help. But maybe it's not worth the effort.

Anyways, my concrete issue should be fixed by your patch, so thanks a lot for this!

Copy link
Contributor

@paddycarver paddycarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Mostly just looking for a comment update, and had a couple suggestions about the mutex strategy. If I'm misunderstanding, please feel free to correct me!

@@ -266,3 +267,6 @@ func handleNotFoundError(err error, d *schema.ResourceData, resource string) err

return fmt.Errorf("Error reading %s: %s", resource, err)
}

// This is a global MutexKV for use within this plugin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to explain what it's locking on, and what limitation makes the lock necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, rather than having this be plugin-wide, what about making it just for this particular concept--e.g., locking insertion/removal of SQL instances? Then we don't need to worry about the prefix, we can just use something that uniquely identifies an SQL instance amongst all SQL instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS plugin offers a similar Provider wide mutex, so the approach here seems consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, and good to know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, and good to know!

@@ -67,6 +68,7 @@ func resourceSqlDatabaseCreate(d *schema.ResourceData, meta interface{}) error {
}

err = sqladminOperationWait(config, op, "Insert Database")
googleMutexKV.Unlock(instanceMutexKey(project, instance_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see this after line 60 in a defer, to be honest, just to cover our bases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does defer here actually result in the Unlock not being called until resourceSqlDatabaseRead is called? If so, then I think the unlock here is probably correct, assuming the resourceSqlDatabaseRead is idempotent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, defer here will result in the lock persisting until resourceSqlDatabaseRead completes, so probably not what we want to do.

That said, we probably do want to add an additional unlock in the error check block, yeah?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I knew defer would block until resourceSqlDatabaseRead was finished; I checked to make sure a lock wasn't needed inside it. IMO, having the lock stuff work without the danger of missing a lock (like in the error check block that you noticed) outweighs the expanded lock time, but I'm willing to be outvoted on that by you and @danawillow.

Regardless, I think we do want to make sure the lock is released after that error check block, even though (IIUC) Terraform will exit anyways, meaning the lock would never be used again regardless.

Copy link
Contributor

@paddycarver paddycarver May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I stand corrected on the unlock in the error check block--this Unlock will fire before the error checking, so another Unlock in the error check block below is unnecessary.)

(EDIT: Ooops, we mean the error check block above. I stand incorrected.)

@@ -121,6 +124,7 @@ func resourceSqlDatabaseDelete(d *schema.ResourceData, meta interface{}) error {
}

err = sqladminOperationWait(config, op, "Delete Database")
googleMutexKV.Unlock(instanceMutexKey(project, instance_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, would really like to see the defer approach used here, for completeness.

@@ -78,6 +79,7 @@ func resourceSqlUserCreate(d *schema.ResourceData, meta interface{}) error {
}

err = sqladminOperationWait(config, op, "Insert User")
googleMutexKV.Unlock(instanceMutexKey(project, instance))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer this?

@@ -154,6 +157,7 @@ func resourceSqlUserUpdate(d *schema.ResourceData, meta interface{}) error {
}

err = sqladminOperationWait(config, op, "Insert User")
googleMutexKV.Lock(instanceMutexKey(project, instance))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer this?

@@ -187,6 +192,7 @@ func resourceSqlUserDelete(d *schema.ResourceData, meta interface{}) error {
}

err = sqladminOperationWait(config, op, "Delete User")
googleMutexKV.Unlock(instanceMutexKey(project, instance))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer this?

Copy link
Contributor

@catsby catsby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me, recommend waiting for @paddycarver to review my comment on defer

@@ -266,3 +267,6 @@ func handleNotFoundError(err error, d *schema.ResourceData, resource string) err

return fmt.Errorf("Error reading %s: %s", resource, err)
}

// This is a global MutexKV for use within this plugin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS plugin offers a similar Provider wide mutex, so the approach here seems consistent

@paddycarver
Copy link
Contributor

I'm fine with it, but if @danawillow is willing to update to release the lock in the error block, that'd be even better. :) Leaving to @danawillow.

@danawillow
Copy link
Contributor Author

I'm fine with the defer- releasing the lock after the read is done seems reasonable. We can always change it later. Updated to add it.

Copy link
Contributor

@paddycarver paddycarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing--is it worth using the mutexKV we already define in provider.go instead of adding googleMutexKV?

@danawillow
Copy link
Contributor Author

Yes of course, just needed to merge with master. Done.

@danawillow danawillow merged commit 0d1718f into hashicorp:master May 31, 2017
@danawillow danawillow deleted the is-9018 branch May 31, 2017 19:20
@ghost
Copy link

ghost commented Apr 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

google_sql_database.users: operationInProgress error on terraform apply
5 participants