Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

Merged
merged 46 commits into from
Sep 30, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Sep 7, 2024

This PR exposes an T.fork.async/.await/.awaitAll API for tasks to spawn futures that share the task evaluator's execution context, integrates it with the PromptLogger, and uses it to allow Mill to parallelize test suites on a per-test-class basis. This is only the first such use case for this .sandboxedFuture API, and I expect there'll be many more (e.g. I've been wanting to parallelize the sonatype uploader)

  • The spawning of task-futures doesn't provide any additional flexibility we don't already have today: we already find tasks like the Coursier downloader spawning ad-hoc thread pools to parallelize work on. Instead, what it does is allow such tasks to parallelize work onto threads in a way that collaborates well with the existing evaluator thread pools, --jobs configuration, folder sandboxing, and terminal UI/logging. Rather than the status quo where such thread pools are invisible things running silently in the background doing god-knows-what
    • Task-futures still do not allow dynamic changes to the evaluator task graph: they only run within the context of a single task, and the top-level task graph remains static once resolved and planned.
    • Task-futures are integrated into the PromptLogger, such that the the prompt shows them grouped under their parent task, and their prefix [106-0] or [106-1] associated with their parent task [106]. To do this, the PromptLogger now treats the keys as key: Seq[String] rather than key: String
    • The API for spawning task-futures def async[T](dest: Path, key: String, message: String)(t: => T): Future[T] (naming could be better?) is designed to fit into Mill, ensuring you provide a place to put filesystem logs, a filesystem sandbox folder, TUI log prefix and prompt label, etc. That means that such Futures are as observable and sandboxed as normal Mill tasks, and have most of the same properties
Screenshot 2024-09-26 at 9 11 46 PM
  • The test grouping is roughly a port of the SBT testGrouping feature https://www.scala-sbt.org/1.x/docs/Testing.html#Forking+tests and serves the same purpose. This is most useful for codebases with large modules each of which has a lot of test classes within them, and the flexible nature of def testForkGrouping gives the user room to tune exactly how the tests are grouped for maximal performance:

    • Too small groups and the JVM startup overhead dominates
    • Too large groups and you don't get enough parallelism
    • Also some tests may not be amenable to running in the same forked JVM as others, due to conflicting read/writes to the filesystem or manipulation of JVM-global stateful variables

To test the results for testForkGrouping, I ran time ./mill -i scalalib.test with and without test grouping on my 10 core macbook pro (after breaking up HelloWorldTests.scala for better granularity), we see about a 3x speedup.

Without Test Grouping, all test classes in 1 JVM (default)

581.83s user 48.25s system 181% cpu 5:47.11 total

With Test Grouping, 3 test classes per JVM (def testForkGrouping = discoveredTestClasses().grouped(3).toSeq)

656.06s user 40.93s system 577% cpu 2:00.68 total

With Test Grouping, 1 test class per JVM (def testForkGrouping = discoveredTestClasses().grouped(1).toSeq)

707.30s user 45.21s system 509% cpu 2:27.72 total

The limited speedup is likely due to the heavy nature of Mill tests meaning that running sequentially they already use multiple cores, and I would expect a greater speedup for most projects whose tests would be more lightweight. We can also see that 1-test-class-per-JVM is somewhat slower than 3-test-classes-per-JVM in this case, likely due to JVM overhead becoming significant

This feature is opt-in via def testForkGrouping = discoveredTestClasses().map(Seq(_)). The default behavior of running all tests in a single JVM is the unchanged

Implementation Notes

  • We re-use the same ExecutionContext that Mill uses internally for scheduling its targets, allowing the scheduling to be cooperative.

    • For example, this means that test classes running in parallel and other tasks use the same pool of threads, keeping constant the total number of threads on a global basis
  • We convert the default FixedThreadPool into a ForkJoinPool provide a blocking{...} operation to allow the ForkJoinPool to spawn an additional thread when an existing thread is blocked waiting.

    • This is necessary when we wait for the Futures spawned for each test class, as the task-level thread is idle and we want to continue making use of the available CPUs.
    • This is basically the same implementation the scala.concurrent.ExecutionContext.global does, but globals implementation is private and not re-usable link and so I have to duplicate the small amount of code wiring it up
    • The ForkJoinPool concurrency and thread-management model is pretty battle-tested in Java land, so although it's new here I'm not too worried about its performance and robustness, and anyway Mill is a pretty low-concurrrency system overall so it shouldn't be pushing any limits
  • Each test class runs in a subprocess in a separate JVM with a separate sandbox folder, and their outputs are then all read and consolidated back into the combined output for the original test task.

    • There is some overhead to spawning JVMs, but from my experience doing the same in Bazel that overhead is manageable and the benefits of class-level parallelism win out
  • This is controlled by the target def testForkGrouping: Seq[Seq[String]], which defaults to Seq(discoveredTestClasses()) to put them all in one group, but can be customized in arbitrary ways

This is covered by additional unit tests and java/scala/kotlin example tests included in the docsite

@lihaoyi lihaoyi changed the title Parallelize test targets on a per-test-class basis Allow grouping and parallelization of test targets within a single module Sep 17, 2024
@lihaoyi lihaoyi marked this pull request as ready for review September 26, 2024 13:02
@lihaoyi lihaoyi changed the title Allow grouping and parallelization of test targets within a single module Add API for spawning sub-tasks, use for grouping and parallelization of test targets within a single module Sep 26, 2024
@lihaoyi lihaoyi changed the title Add API for spawning sub-tasks, use for grouping and parallelization of test targets within a single module Add API for spawning sub-tasks, use it for grouping and parallelization of test targets within a single module Sep 26, 2024
@@ -404,6 +404,7 @@ trait MillScalaModule extends ScalaModule with MillJavaModule with ScalafixModul
def moduleDeps = outer.testModuleDeps
def ivyDeps = super.ivyDeps() ++ outer.testIvyDeps()
def forkEnv = super.forkEnv() ++ outer.forkEnv()
// override def testForkGrouping = discoveredTestClasses().grouped(1).toSeq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can turn this on properly once we re-bootstrap

@lihaoyi lihaoyi changed the title Add API for spawning sub-tasks, use it for grouping and parallelization of test targets within a single module Add API for spawning task-futures, use it for grouping and parallelization of test targets within a single module Sep 26, 2024
@lihaoyi lihaoyi changed the title Add API for spawning task-futures, use it for grouping and parallelization of test targets within a single module Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module Sep 26, 2024
@lolgab
Copy link
Member

lolgab commented Sep 26, 2024

def testForkGrouping = discoveredTestClasses().grouped(3).toSeq

is not a great grouping strategy because it will spawn too many JVMs.
It's nice since it's a one-liner, it's not a great suggestion for the examples.
We could provide a balancing function ourselves or suggest something that splits all the classes into a few groups like:

def testForkGrouping = T {
  val classes = discoveredTestClasses()
  classes.grouped(classes.size / 3)
}

@lihaoyi
Copy link
Member Author

lihaoyi commented Sep 27, 2024

Mill does not have the information to do balancing "properly" though: classes.grouped(classes.size / 3) may be too many groups when the test module is small, and too few groups when the test module is large. In general Mill does not know how long each test class takes to run, which is what would be required to do this properly

It's possible we could do something like dynamic work stealing, or to store historical test run times and group based on that, but for an initial pass I didn't want to be too sophisticated. At my previous job we did per-class forking and despite the per-JVM overhead it generally worked pretty well, and this is essentially the API that SBT provides, so IMO it's a reasonable first pass and we can try to be more sophisticated in a follow up

@lihaoyi lihaoyi merged commit 05bef7e into com-lihaoyi:main Sep 30, 2024
24 checks passed
@lefou lefou added this to the 0.12.0-RC3 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants