Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build pipelining support #744

Merged
merged 12 commits into from
Jul 7, 2020
Merged

Build pipelining support #744

merged 12 commits into from
Jul 7, 2020

Conversation

eed3si9n
Copy link
Member

@eed3si9n eed3si9n commented Mar 3, 2020

This is a work-in-progress, open source port of build pipelining support implemented by Lightbend Scala team (scala/scala#7712) and Morgan Stanley (https://github.com/mspnf/zinc/blob/1.3.x-contrib/README-MORGANSTANLEY.MD).

  • Macro support
  • Java support
  • sbt support

Ref scala/scala#7712
Ref scala/scala-dev#643

This adds a new flag incOption.withPipelining(true) that enables incremental build pipelining.

Note: If incOptions pipelining is enabled, the build tool must call compileAllJava(in, logger) after both

  1. All upstream subproject Scala and Java compilation has completed, and
  2. Scala whole *.class compilation has completed && hasModified.

@dwijnand

This comment has been minimized.

@eed3si9n

This comment has been minimized.

@dwijnand

This comment has been minimized.

@retronym
Copy link
Member

retronym commented May 4, 2020

It would be good if we could use PathFactory and -Ypickle-write to get the pickles out of the compiler, rather than having the new pickleData extension point in the compiler callback.

Here's what I had in mind:

https://github.com/scala/scala/blob/8538d98989b33d0090fdd142435320abae10a2fd/test/junit/scala/tools/nsc/PickleWriteTest.scala#L52-L97

This can be combined with -Ypickle-write-api-only which limits the written .sig files to non-private API. Such pickles aren't (currently) stored in a map like Run#symData, as I wanted to avoid a) another ad-hoc extension point and b) keeping public pickles in memory until the end of the Run.

We can stick with this Run.symData to get to a viable version of this PR though.

@dwijnand dwijnand mentioned this pull request May 4, 2020
16 tasks
@eed3si9n
Copy link
Member Author

eed3si9n commented May 4, 2020

Here are some challenges:

  1. keeping *.class and *.sig in sync, i.e. propagating deletion of *.class to *.sig,
  2. backing out A.class and A.sig while compiling A.scala,
  3. supporting both compile-to-JAR and -to-directory,
  4. making sure it works for both Java and Scala, and
  5. making sure it works for subproject dependencies.

For example, given subprojects core (A.scala, ...) and app (App.scala) that depends on core, transactional ClassFileManager today backs out A.class when compiling A.scala.

  1. Compile both app and core subprojects.
  2. Introduce a change to A.scala so it no longer compiles.
  3. Try compiling app and core again. A.sig would still exist in the classpath, so App.scala compilation succeeds (To be clear that's a problem. The compilation of App.scala should fail).

@smarter
Copy link
Contributor

smarter commented May 4, 2020

keeping *.class and *.sig in sync, i.e. propagating deletion of *.class to *.sig,

We have the exact same problem with .tasty, which is why we only emit .tasty in the backend even though they could be emitted at pickler (and even then we also need all of this wonderful stuff: https://github.com/lampepfl/dotty/blob/11760ea2b23ad7eb5b5749f489ee5395980c0daf/sbt-dotty/src/dotty/tools/sbtplugin/DottyPlugin.scala#L127-L152)

@eed3si9n
Copy link
Member Author

eed3si9n commented May 4, 2020

Also to document "making sure it works for both Java and Scala" part of the challenge... Here's MixedAnalyzingCompiler#compile as it stands:

/* `Mixed` order defaults to `ScalaThenJava` behaviour.
* See https://github.com/sbt/zinc/issues/234. */
if (config.currentSetup.order == JavaThenScala) {
compileJava(); compileScala()
} else {
compileScala(); compileJava()
}

By default it calls compileScala() and immediately afterwards it calls compileJava().

delay Javac compilation?

Using core and app subproject as example, let's say both subprojects are Scala and Java sources. The question is when can we start "app" subproject compile. Ideally we want to start as soon as early artifacts are available from core. compileScala() should go through fine. compileJava() would fail because Javac wants real *.class file in the classpath.

So should we wait till whole compilation is done for core subproject? Doing so would undermine the pipelining whenever *.java files are included. If we want to avoid that, we need a way to queue Javac compilation, and wait until whole compilations of subproject dependencies are done.

*.class-based JavaAnalyze?

AnalyzingJavaCompiler (https://github.com/sbt/zinc/blob/v1.4.0-M5/zinc/src/main/scala/sbt/internal/inc/javac/AnalyzingJavaCompiler.scala) runs analysis on top of the *.class file created by javac.

If we try to pipeline mixed Java/Scala subprojects, the Analysis information created by JavaAnalyze will not be available to downstream subprojects. Instead it will use the Analysis information created via -Ypickle-java + xsbt-dependency phase etc. I'm not sure if we lose some accuracy because of this.

@eed3si9n
Copy link
Member Author

eed3si9n commented May 4, 2020

Here's a diagram to illustrate this.

pipelining

@retronym
Copy link
Member

retronym commented May 5, 2020

So should we wait till whole compilation is done for core subproject?

The intention is that we use .sig files generated by -Ypickle-java as inputs to downstream scalac, and .class files as inputs to downstream javac.app is too coarse a node in this dependency graph, we need app:scalac and app:java as separate nodes.

Downstream projects that use .sig files will be presented with an partial ("early"?) Analysis that lacks some information. As you pointed out, AnalyzingJavaCompiler currently extracts the API, used libraries, used internal classes from .class files.

AFAICT some of this information (e.g. external library dependencies) is irrelevant for the decision making of the downstream incremental compiler and won't be needed until subsequent compilation of this project. The API extraction is important, but should be safe to replace with extact-api processing the API extracted by scalac -Ypickle-java. The API hash won't be identical due to implementation differences, but that's not a problem if you're always using one strategy or the other.

I think it would be useful to validate our assumptions about what information is irrelevant to downstream incremental compilation, they would only be consulted as part of previousRelations. Could we validated the assumptions by modifying LookupImpl.analyses to filter out the information we believe is not needed from classpath analyses?

@eed3si9n
Copy link
Member Author

eed3si9n commented May 5, 2020

Could we validated the assumptions by modifying LookupImpl.analyses to filter out the information we believe is not needed from classpath analyses?

Good idea. I think all we need are analysis.relations.productClassName and analysis.apis.internalAPI accessed by

def findExternalAnalyzedClass(lookup: Lookup)(binaryClassName: String): AnalyzedClass = {
val maybeInternalAPI = for {
analysis0 <- lookup.lookupAnalysis(binaryClassName)
analysis = analysis0 match { case a: Analysis => a }
className <- analysis.relations.productClassName.reverse(binaryClassName).headOption
} yield analysis.apis.internalAPI(className)
maybeInternalAPI.getOrElse(APIs.emptyAnalyzedClass)
}

analysis.relations.productClassName is determined by nonLocalClasses and localClasses collected in AnalysisCallback. In API Extraction phase registerGeneratedClasses registers non-local *.class files - https://github.com/sbt/zinc/blob/develop/internal/compiler-bridge/src/main/scala/xsbt/API.scala#L117, whereas in Java Analyzer phase registers local classes, which should not impact the downstream.

@dwijnand dwijnand marked this pull request as ready for review June 4, 2020 14:54
@dwijnand dwijnand marked this pull request as draft June 4, 2020 15:09
@dwijnand

This comment has been minimized.

@eed3si9n

This comment has been minimized.

@eed3si9n

This comment has been minimized.

@dwijnand dwijnand force-pushed the wip/pipelining branch 2 times, most recently from 0c47d86 to 4e392cf Compare July 6, 2020 20:53
eed3si9n and others added 12 commits July 6, 2020 22:17
For modular pipelining we need both early output (JAR file containing Scala sig files) and early Analysis (Zinc internal information).

This adds `IncOptions#earlyOutput` and `Lookup#storeEarlyAnalysis` so the early artifacts can be generated during compile phases.
Early analysis store is passed into the Setup.
CompileProgress is used to notify the early output timing.
The deletion part required some tweaking to make sure this works for dir-based output.
If a macro is found in any classes, use the regular output.
With only Scala pipelining implemented, this Java-only pipelining test
is pending.

The issue lies in the fact that while scalac can consume its own .sig
early output artifacts, javac can't.
mergeAndInvalidate will merge the partialAnalysis with the pruned
Analysis, so merged it twice will break it (duplicate Compilations).

Only record the generated products in the partialAnalysis - fails the
restore-classes scripted test if not.
This is to avoid the "The process cannot access the file because it is
being used by another process" issue, on Windows.
The entry point API of Zinc, xsbti.compile.IncrementalCompiler, is
defined in compiler-interface and is, therefore, entirely in Java, only.

Therefore, in order to allow a user (e.g. an external build tool) to
define their own profiler using this interface I extracted a Java-only
interface for InvalidationProfiler and its components.

Having done that I could add it to compiler-interface's
xsbti.compile.ExternalHooks.

I've kept the original, Scala variant, interface methods (e.g. using
Scala's Iterable instead of Array or java.util.Set) for backwards
compatibility and I've created an AdaptedRunProfiler that knows how to
adapt the calls to the underlying XRunProfiler it contains.
@eed3si9n
Copy link
Member Author

eed3si9n commented Jul 7, 2020

subproject-java scripted test demonstrates the need to coordinate Java compilation.
I'm fine with merging this as-is, and following up with that in a later PR.

@eed3si9n eed3si9n merged commit db1039b into sbt:develop Jul 7, 2020
@eed3si9n eed3si9n deleted the wip/pipelining branch July 7, 2020 03:42
eed3si9n added a commit to eed3si9n/sbt that referenced this pull request Jul 27, 2020
Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
eed3si9n added a commit to eed3si9n/sbt that referenced this pull request Jul 27, 2020
Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
eed3si9n added a commit to eed3si9n/sbt that referenced this pull request Jul 27, 2020
Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
eed3si9n added a commit to eed3si9n/sbt that referenced this pull request Aug 4, 2020
Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
eed3si9n added a commit to eed3si9n/sbt that referenced this pull request Aug 6, 2020
Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants