-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: stoppable provisioners, helper/schema for provisioners #10934
Conversation
This switches to the Go "context" package for cancellation and threads the context through all the way to evaluation to allow behavior based on stopping deep within graph execution. This also adds the Stop API to provisioners so they can quickly exit when stop is called.
This modifies local-exec to be stoppable with the new Stop API call that provisioners can listen to.
ffd58ce
to
a8f64cb
Compare
// and promote them to a list. For example "foo" would be promoted to | ||
// ["foo"] automatically. This is primarily for legacy reasons and the | ||
// ambiguity is not recommended for new usage. Promotion is only allowed | ||
// for primitive element types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, but I don't understand why we run into this. Did we allow defining single elements for a list in the config previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the case of inline
parameter of remote-exec
provisioner, see what is removed in builtin/provisioners/remote-exec/resource_provisioner.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, backwards compatibility here is the reason this was necessary.
sorry, didn't mean to approve yet. still reviewing some, so please don't merge ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some race issue around the context fields.
@@ -634,30 +633,34 @@ func (c *Context) Refresh() (*State, error) { | |||
// | |||
// Stop will block until the task completes. | |||
func (c *Context) Stop() { | |||
log.Printf("[WARN] terraform: Stop called, initiating interrupt sequence") | |||
|
|||
c.l.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no special-case lock handling here, so Unlock should be defered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
c.sh.Stop() | ||
// Stop the context | ||
c.runContextCancel() | ||
c.runContextCancel = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set this to nil? This one is OK right now I think, but sets the code up for races in the future,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to to avoid double-cancels. I'm not sure if that actually has an effect on context though. See Stop()
// If this is called during a proper run operation, this will never | ||
// be nil. | ||
var stopCh <-chan struct{} | ||
if ctx := c.runContext; ctx != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely races with acquireRun
, which sets the context fields under a mutex. What we can do here is have walk
get the stopCh
and pass it in to avoid reading that field without the mutex:
func (c *Context) watchStop(walker *ContextGraphWalker, stopCh, doneCh <-chan struct{})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of special casing this during tests either; can we ensure that it's set for tests too and get rid of these nil checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its safe to access runContext because an entire Run is wrapped in a lock (acquireRun acquires a lock via cond vars). Fixing the special case though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but watchStop
doesn't take a lock, and can race with any method writing to the context fields. Just verified this shows up readily with the race detector.
This should fix it by removing the read of that field from watchStop entirely:
@@ -811,7 +811,9 @@ func (c *Context) walk(
// Watch for a stop so we can call the provider Stop() API.
doneCh := make(chan struct{})
- go c.watchStop(walker, doneCh)
+ stopCh := c.runContext.Done()
+
+ go c.watchStop(walker, stopCh, doneCh)
// Walk the real graph, this will block until it completes
realErr := graph.Walk(walker)
@@ -906,12 +908,7 @@ func (c *Context) walk(
return walker, realErr
}
-func (c *Context) watchStop(walker *ContextGraphWalker, doneCh <-chan struct{}) {
- // Get the stop channel. runContext will never be nil since this should
- // only be called within the context of an operation started with
- // acquireRun
- stopCh := c.runContext.Done()
-
+func (c *Context) watchStop(walker *ContextGraphWalker, stopCh, doneCh <-chan struct{}) {
// Wait for a stop or completion
select {
case <-stopCh:
Addressed feedback. Can you check the latest commits? All the |
case <-doneCh: | ||
case <-ctx.Done(): | ||
cmd.Process.Kill() | ||
err = cmd.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this err
assignment races with the one in the goroutine above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed!
All current feedback addressed. Please review again. :) |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Fast provisioner cancellation! 💃
This PR includes two things:
Create a new Stop API for provisioners that is called when terraform's Stop API (triggered by things like interrupts) is called.
Introduce a
helper/schema
framework for provisioners. This makes it easy to consume the Stop API also introduced in this PR.This PR uses
context
heavily to make it easy to consume the cancellation, and helper/schema handles managing this context. Terraform's core stop mechanism was also changed to usecontext
.