-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process deletions conservatively in parallel #1963
Conversation
This commit allows the engine to conservatively delete resources in parallel when it is sure that it is legal to do so. In the absence of a true data-flow oriented step scheduler, this approach provides a significant improvement over the existing serial deletion mechanism. Instead of processing deletes serially, this commit will partition the set of condemned resources into sets of resources that are known to be legally deletable in parallel. The step executor will then execute those independent lists of steps one-by-one until all steps are complete.
Fixes #1625 unless we think that this conservative approach isn't good enough for us |
pkg/resource/graph/resource_set.go
Outdated
import "github.com/pulumi/pulumi/pkg/resource" | ||
|
||
// ResourceSet represents a set of Resources. | ||
type ResourceSet struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like overkill--why not just use a map
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted ResourceSet.Intersect
and it's annoying to do that with just maps. I don't mind having this either way, I was just not a fan of doing set intersections of with map[*resource.State]bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you can just do
type ResourceSet map[*resource.State]bool
func (rs ResourceSet) Intersect(other ResourceSet) ResourceSet {
i := make(ResourceSet)
for r := range rs {
if other[r] {
i[r] = true
}
}
return i
}
And then just interact with the set normally otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, cool, I wasn't aware that you could do this while also preserving map-like behavior.
pkg/resource/deploy/step_executor.go
Outdated
|
||
// A CompletionToken is a token returned by the step executor that is completed when the chain has completed execution. | ||
// Callers can use it to optionally wait synchronously on the completion of a chain. | ||
type CompletionToken struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, do any of the executor/token/etc. types need to be exported? IIRC they're now all internal to this package, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, these don't need to be exported.
if len(condemnedDependencies) == 0 && !condemned[dg.ParentOf(res)] { | ||
// If not, it's safe to delete res at this stage. | ||
logging.V(7).Infof("Planner scheduling deletion of '%v'", res.URN) | ||
if res.Delete && !pendingDeletesAreReplaces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accidentally misnamed pendingDeletesAreReplaces
here (I renamed it just before opening the PR). The polarity of this bool should be reversed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in the next commit.
One issue that I just encountered that will need to be fixed before this merges is that this optimization is not legal when the host language doesn't provide the engine with full dependency information (namely Python, most likely Go as well). For those languages we will have to resort to serial deletes as we do today. |
I so can't wait for this to land!
Go should faithfully record dependencies. You're right that Python absolutely does not, however. |
This PR is ready to merge, pending reviews. |
ping @pgavlin @lukehoban - I'd like to get this in so we can begin getting some mileage on it, if possible. |
// If we already deleted this resource due to some other DBR, don't do it again. | ||
if sg.deletes[urn] { | ||
continue | ||
if sg.opts.TrustDependencies { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this returns us to the original behavior of delete-before-create when we're running a Python program, right? Are Python dependencies incomplete, or just wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this returns us to the original behavior of delete-before-create when we're running a Python program, right?
Yes, that's correct.
Are Python dependencies incomplete, or just wrong?
They're incomplete. Python does not honor dependsOn
nor does it make any attempt to track implicit dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, there's most likely going to be no behavior change here when using Python. The Python dependency graph is always empty because no dependencies are reported. When adding TrustDependencies
, I figured that it should guard against any use of dependency graphs, since it's (in my mind) wrong to use one if we know that the information contained in it is incomplete and untrustworthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we could "improve" this situation by handling this in the language host and having it report all preceding resources as dependencies. I don't think that we should do that--there are few enough places where we use dependency information that I don't believe that it's necessary--but I think that it's worth mentioning for completeness.
@@ -399,75 +402,182 @@ func (sg *stepGenerator) GenerateSteps(event RegisterResourceEvent) ([]Step, *re | |||
return []Step{NewCreateStep(sg.plan, event, new)}, nil | |||
} | |||
|
|||
func (sg *stepGenerator) GenerateDeletes() []Step { | |||
func (sg *stepGenerator) GenerateDeletes() []antichain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it may be a bit cleaner to let this method generate the list of steps and then let the caller schedule that list (rather than splitting the responsibility for step generation across multiple methods). That would eliminate a fair bit of the conditional code here, and would allow scheduleDeletes
to omit the pendingDeletesAreReplaces
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this more concrete, I would change the signature of scheduleDeletes
s.t. it takes a []Step
. It can then decide whether to just return the slice as-is (in the case that dependency information is not conservatively correct) or to schedule using antichain decomposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - will do.
@@ -69,6 +69,53 @@ func (dg *DependencyGraph) DependingOn(res *resource.State) []*resource.State { | |||
return dependents | |||
} | |||
|
|||
// DependenciesOf returns a ResourceSet of resources upon which the given resource depends. | |||
func (dg *DependencyGraph) DependenciesOf(res *resource.State) ResourceSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to include the parent here? In a very real sense it is a dependency--a resource R with a parent P cannot be registered prior to P.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, and it probably should - I have to explicitly add the parent to the set of dependencies in the step generator anyway for this reason.
|
||
cursorIndex, ok := dg.index[res] | ||
contract.Assert(ok) | ||
for i := cursorIndex; i >= 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be cursorIndex-1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. will fix.
@pgavlin The last two commits address your PR feedback. |
} | ||
} | ||
|
||
set[dg.ParentOf(res)] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just look for this resource in the loop above? That would also let us eliminate the ParentOf
function below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for bearing with me.
thanks for the review! |
This commit allows the engine to conservatively delete resources in
parallel when it is sure that it is legal to do so. In the absence of a
true data-flow oriented step scheduler, this approach provides a
significant improvement over the existing serial deletion mechanism.
Instead of processing deletes serially, this commit will partition the
set of condemned resources into sets of resources that are known to be
legally deletable in parallel. The step executor will then execute those
independent lists of steps one-by-one until all steps are complete.