Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process deletions conservatively in parallel #1963

Merged
merged 8 commits into from
Sep 27, 2018

Conversation

swgillespie
Copy link
Contributor

This commit allows the engine to conservatively delete resources in
parallel when it is sure that it is legal to do so. In the absence of a
true data-flow oriented step scheduler, this approach provides a
significant improvement over the existing serial deletion mechanism.

Instead of processing deletes serially, this commit will partition the
set of condemned resources into sets of resources that are known to be
legally deletable in parallel. The step executor will then execute those
independent lists of steps one-by-one until all steps are complete.

This commit allows the engine to conservatively delete resources in
parallel when it is sure that it is legal to do so. In the absence of a
true data-flow oriented step scheduler, this approach provides a
significant improvement over the existing serial deletion mechanism.

Instead of processing deletes serially, this commit will partition the
set of condemned resources into sets of resources that are known to be
legally deletable in parallel. The step executor will then execute those
independent lists of steps one-by-one until all steps are complete.
@swgillespie
Copy link
Contributor Author

Fixes #1625 unless we think that this conservative approach isn't good enough for us

import "github.com/pulumi/pulumi/pkg/resource"

// ResourceSet represents a set of Resources.
type ResourceSet struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like overkill--why not just use a map directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted ResourceSet.Intersect and it's annoying to do that with just maps. I don't mind having this either way, I was just not a fan of doing set intersections of with map[*resource.State]bool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can just do

type ResourceSet map[*resource.State]bool

func (rs ResourceSet) Intersect(other ResourceSet) ResourceSet {
    i := make(ResourceSet)
    for r := range rs {
        if other[r] {
            i[r] = true
        }
    }
    return i
}

And then just interact with the set normally otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool, I wasn't aware that you could do this while also preserving map-like behavior.


// A CompletionToken is a token returned by the step executor that is completed when the chain has completed execution.
// Callers can use it to optionally wait synchronously on the completion of a chain.
type CompletionToken struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do any of the executor/token/etc. types need to be exported? IIRC they're now all internal to this package, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, these don't need to be exported.

if len(condemnedDependencies) == 0 && !condemned[dg.ParentOf(res)] {
// If not, it's safe to delete res at this stage.
logging.V(7).Infof("Planner scheduling deletion of '%v'", res.URN)
if res.Delete && !pendingDeletesAreReplaces {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally misnamed pendingDeletesAreReplaces here (I renamed it just before opening the PR). The polarity of this bool should be reversed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix in the next commit.

@swgillespie
Copy link
Contributor Author

One issue that I just encountered that will need to be fixed before this merges is that this optimization is not legal when the host language doesn't provide the engine with full dependency information (namely Python, most likely Go as well). For those languages we will have to resort to serial deletes as we do today.

@joeduffy
Copy link
Member

I so can't wait for this to land!

namely Python, most likely Go as well

Go should faithfully record dependencies. You're right that Python absolutely does not, however.

@swgillespie
Copy link
Contributor Author

This PR is ready to merge, pending reviews.

@swgillespie
Copy link
Contributor Author

ping @pgavlin @lukehoban - I'd like to get this in so we can begin getting some mileage on it, if possible.

// If we already deleted this resource due to some other DBR, don't do it again.
if sg.deletes[urn] {
continue
if sg.opts.TrustDependencies {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this returns us to the original behavior of delete-before-create when we're running a Python program, right? Are Python dependencies incomplete, or just wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this returns us to the original behavior of delete-before-create when we're running a Python program, right?

Yes, that's correct.

Are Python dependencies incomplete, or just wrong?

They're incomplete. Python does not honor dependsOn nor does it make any attempt to track implicit dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, there's most likely going to be no behavior change here when using Python. The Python dependency graph is always empty because no dependencies are reported. When adding TrustDependencies, I figured that it should guard against any use of dependency graphs, since it's (in my mind) wrong to use one if we know that the information contained in it is incomplete and untrustworthy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we could "improve" this situation by handling this in the language host and having it report all preceding resources as dependencies. I don't think that we should do that--there are few enough places where we use dependency information that I don't believe that it's necessary--but I think that it's worth mentioning for completeness.

@@ -399,75 +402,182 @@ func (sg *stepGenerator) GenerateSteps(event RegisterResourceEvent) ([]Step, *re
return []Step{NewCreateStep(sg.plan, event, new)}, nil
}

func (sg *stepGenerator) GenerateDeletes() []Step {
func (sg *stepGenerator) GenerateDeletes() []antichain {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it may be a bit cleaner to let this method generate the list of steps and then let the caller schedule that list (rather than splitting the responsibility for step generation across multiple methods). That would eliminate a fair bit of the conditional code here, and would allow scheduleDeletes to omit the pendingDeletesAreReplaces parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more concrete, I would change the signature of scheduleDeletes s.t. it takes a []Step. It can then decide whether to just return the slice as-is (in the case that dependency information is not conservatively correct) or to schedule using antichain decomposition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - will do.

@@ -69,6 +69,53 @@ func (dg *DependencyGraph) DependingOn(res *resource.State) []*resource.State {
return dependents
}

// DependenciesOf returns a ResourceSet of resources upon which the given resource depends.
func (dg *DependencyGraph) DependenciesOf(res *resource.State) ResourceSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to include the parent here? In a very real sense it is a dependency--a resource R with a parent P cannot be registered prior to P.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, and it probably should - I have to explicitly add the parent to the set of dependencies in the step generator anyway for this reason.


cursorIndex, ok := dg.index[res]
contract.Assert(ok)
for i := cursorIndex; i >= 0; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be cursorIndex-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. will fix.

@swgillespie
Copy link
Contributor Author

@pgavlin The last two commits address your PR feedback.

}
}

set[dg.ParentOf(res)] = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just look for this resource in the loop above? That would also let us eliminate the ParentOf function below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Member

@pgavlin pgavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks for bearing with me.

@swgillespie swgillespie merged commit ed0353e into master Sep 27, 2018
@pulumi-bot pulumi-bot deleted the swgillespie/parallel-delete branch September 27, 2018 22:49
@swgillespie
Copy link
Contributor Author

thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants