Skip to content

Commit

Permalink
Make loading/analysis phase a lot faster.
Browse files Browse the repository at this point in the history
There is a fundamental bottleneck in the way targets are currently configured:
The ConfiguredTargetFunction needs to load all dependent targets before it can
configure its own target. Now, outgoing deps of a target can be in three
states:
1. Package of dependency target not yet loaded
2. Dependency target not configured, but package already loaded
3. Dependency target configured

State bazelbuild#3 is good, we don't need to do any more work before we can configure the
current target. Now, most targets will have a combination of deps in states bazelbuild#1
and bazelbuild#2. The current behavior is to schedule the target configuration of deps in
state bazelbuild#2 and the package loading in state bazelbuild#1 (to get to state bazelbuild#3 and bazelbuild#2,
respectively). Only after *all* of these are computed will the corresponding
ConfiguredTargetFunction be rescheduled upon which the remaining deps that were
in state bazelbuild#1 before are now in state bazelbuild#2 and the corresponding target
configuration can be started.

This creates a substantial parallelization bottleneck by artifically sync'ing
the state transitions (bazelbuild#1->bazelbuild#2 and bazelbuild#2->bazelbuild#3) for all deps. In turn, this makes the
critical path artifically long. Instead, settle for just loading dependent
packages as long as there are any. This is a fast single-step operation.
Postpone the configuration of dependent targets to later as it can require long
dependency chains to be configured.

In theory, a better solution would be to split the target configuration in a
way so that it is possible to ask for configured targets without yet knowing
their package. However, that would require pulling config transitions apart (as
some of them happen because of the origin target/config and some of them happen
because of the destination target. Thus we'd need to change the
ConfiguredTargetKey so that the configuration stands for the configuration that
we request the target in, with target-caused transitions already applied. Now
that itself leads to inefficiencies. E.g. we would start requestiong InputFiles
in several configurations just to underneath always load them with a null
configuration.

RELNOTES: Faster analysis by improved parallelization.
PiperOrigin-RevId: 203725209
  • Loading branch information
Googler authored and George Gensure committed Aug 2, 2018
1 parent 365ccac commit e021a30
Showing 1 changed file with 8 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -480,12 +480,14 @@ static OrderedSetMultimap<Attribute, ConfiguredTargetAndData> computeDependencie
hostConfiguration,
ruleClassProvider,
defaultBuildOptions);
// It's important that we don't use "if (env.missingValues()) { return null }" here (or
// in the following lines). See the comments in getDynamicConfigurations' Skyframe call
// for explanation.
if (depValueNames == null) {
return null;
}
}

// Return early in case packages were not loaded yet. In theory, we could start configuring
// dependent targets in loaded packages. However, that creates an artificial sync boundary
// between loading all dependent packages (fast) and configuring some dependent targets (can
// have a long tail).
if (env.valuesMissing()) {
return null;
}

// Resolve configured target dependencies and handle errors.
Expand Down

0 comments on commit e021a30

Please sign in to comment.