Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

lihaoyi · 2024-09-07T10:38:13Z

This PR exposes an T.fork.async/.await/.awaitAll API for tasks to spawn futures that share the task evaluator's execution context, integrates it with the PromptLogger, and uses it to allow Mill to parallelize test suites on a per-test-class basis. This is only the first such use case for this .sandboxedFuture API, and I expect there'll be many more (e.g. I've been wanting to parallelize the sonatype uploader)

The spawning of task-futures doesn't provide any additional flexibility we don't already have today: we already find tasks like the Coursier downloader spawning ad-hoc thread pools to parallelize work on. Instead, what it does is allow such tasks to parallelize work onto threads in a way that collaborates well with the existing evaluator thread pools, --jobs configuration, folder sandboxing, and terminal UI/logging. Rather than the status quo where such thread pools are invisible things running silently in the background doing god-knows-what
- Task-futures still do not allow dynamic changes to the evaluator task graph: they only run within the context of a single task, and the top-level task graph remains static once resolved and planned.
- Task-futures are integrated into the PromptLogger, such that the the prompt shows them grouped under their parent task, and their prefix [106-0] or [106-1] associated with their parent task [106]. To do this, the PromptLogger now treats the keys as key: Seq[String] rather than key: String
- The API for spawning task-futures def async[T](dest: Path, key: String, message: String)(t: => T): Future[T] (naming could be better?) is designed to fit into Mill, ensuring you provide a place to put filesystem logs, a filesystem sandbox folder, TUI log prefix and prompt label, etc. That means that such Futures are as observable and sandboxed as normal Mill tasks, and have most of the same properties

The test grouping is roughly a port of the SBT testGrouping feature https://www.scala-sbt.org/1.x/docs/Testing.html#Forking+tests and serves the same purpose. This is most useful for codebases with large modules each of which has a lot of test classes within them, and the flexible nature of def testForkGrouping gives the user room to tune exactly how the tests are grouped for maximal performance:
- Too small groups and the JVM startup overhead dominates
- Too large groups and you don't get enough parallelism
- Also some tests may not be amenable to running in the same forked JVM as others, due to conflicting read/writes to the filesystem or manipulation of JVM-global stateful variables

To test the results for testForkGrouping, I ran time ./mill -i scalalib.test with and without test grouping on my 10 core macbook pro (after breaking up HelloWorldTests.scala for better granularity), we see about a 3x speedup.

Without Test Grouping, all test classes in 1 JVM (default)

581.83s user 48.25s system 181% cpu 5:47.11 total

With Test Grouping, 3 test classes per JVM (def testForkGrouping = discoveredTestClasses().grouped(3).toSeq)

656.06s user 40.93s system 577% cpu 2:00.68 total

With Test Grouping, 1 test class per JVM (def testForkGrouping = discoveredTestClasses().grouped(1).toSeq)

707.30s user 45.21s system 509% cpu 2:27.72 total

The limited speedup is likely due to the heavy nature of Mill tests meaning that running sequentially they already use multiple cores, and I would expect a greater speedup for most projects whose tests would be more lightweight. We can also see that 1-test-class-per-JVM is somewhat slower than 3-test-classes-per-JVM in this case, likely due to JVM overhead becoming significant

This feature is opt-in via def testForkGrouping = discoveredTestClasses().map(Seq(_)). The default behavior of running all tests in a single JVM is the unchanged

Implementation Notes

We re-use the same ExecutionContext that Mill uses internally for scheduling its targets, allowing the scheduling to be cooperative.
- For example, this means that test classes running in parallel and other tasks use the same pool of threads, keeping constant the total number of threads on a global basis
We convert the default FixedThreadPool into a ForkJoinPool provide a blocking{...} operation to allow the ForkJoinPool to spawn an additional thread when an existing thread is blocked waiting.
- This is necessary when we wait for the Futures spawned for each test class, as the task-level thread is idle and we want to continue making use of the available CPUs.
- This is basically the same implementation the scala.concurrent.ExecutionContext.global does, but globals implementation is private and not re-usable link and so I have to duplicate the small amount of code wiring it up
- The ForkJoinPool concurrency and thread-management model is pretty battle-tested in Java land, so although it's new here I'm not too worried about its performance and robustness, and anyway Mill is a pretty low-concurrrency system overall so it shouldn't be pushing any limits
Each test class runs in a subprocess in a separate JVM with a separate sandbox folder, and their outputs are then all read and consolidated back into the combined output for the original test task.
- There is some overhead to spawning JVMs, but from my experience doing the same in Bazel that overhead is manageable and the benefits of class-level parallelism win out
This is controlled by the target def testForkGrouping: Seq[Seq[String]], which defaults to Seq(discoveredTestClasses()) to put them all in one group, but can be customized in arbitrary ways

This is covered by additional unit tests and java/scala/kotlin example tests included in the docsite

lihaoyi · 2024-09-26T13:35:28Z

build.mill

@@ -404,6 +404,7 @@ trait MillScalaModule extends ScalaModule with MillJavaModule with ScalafixModul
    def moduleDeps = outer.testModuleDeps
    def ivyDeps = super.ivyDeps() ++ outer.testIvyDeps()
    def forkEnv = super.forkEnv() ++ outer.forkEnv()
+//    override def testForkGrouping = discoveredTestClasses().grouped(1).toSeq


We can turn this on properly once we re-bootstrap

lolgab · 2024-09-26T15:00:34Z

def testForkGrouping = discoveredTestClasses().grouped(3).toSeq

is not a great grouping strategy because it will spawn too many JVMs.
It's nice since it's a one-liner, it's not a great suggestion for the examples.
We could provide a balancing function ourselves or suggest something that splits all the classes into a few groups like:

def testForkGrouping = T {
  val classes = discoveredTestClasses()
  classes.grouped(classes.size / 3)
}

lihaoyi · 2024-09-27T00:44:30Z

Mill does not have the information to do balancing "properly" though: classes.grouped(classes.size / 3) may be too many groups when the test module is small, and too few groups when the test module is large. In general Mill does not know how long each test class takes to run, which is what would be required to do this properly

It's possible we could do something like dynamic work stealing, or to store historical test run times and group based on that, but for an initial pass I didn't want to be too sophisticated. At my previous job we did per-class forking and despite the per-JVM overhead it generally worked pretty well, and this is essentially the API that SBT provides, so IMO it's a reasonable first pass and we can try to be more sophisticated in a follow up

lihaoyi added 20 commits September 7, 2024 15:58

.

29048c1

wip

1d9fb57

wip

2503c31

.

8bf7299

.

2b65cca

.

58a059e

.

68ec8e2

.

5fed0e3

.

c5372ad

merge

7d76a7c

.

62ac934

.

b507a1f

.

83a479e

merge

e26c140

filtered test classes using selectors before forking

4adb678

.

0fe0140

.

2cb2ff5

.

069c66b

wip

9e7850b

.

24432b6

lihaoyi changed the title ~~Parallelize test targets on a per-test-class basis~~ Allow grouping and parallelization of test targets within a single module Sep 17, 2024

lihaoyi added 9 commits September 17, 2024 21:57

fix

bd4f778

merge

722db61

put back final doesNotMatch check

6048490

initial merge

34084cd

create simple example

615d57b

wip

2f33c1c

kinda-works

28db0d3

wip

8fe3d5a

wip

6939c74

.

af38db8

lihaoyi marked this pull request as ready for review September 26, 2024 13:02

lihaoyi changed the title ~~Allow grouping and parallelization of test targets within a single module~~ Add API for spawning sub-tasks, use for grouping and parallelization of test targets within a single module Sep 26, 2024

lihaoyi added 4 commits September 26, 2024 21:27

.

51ca042

.

d4866f3

.

3433f03

.

e42d913

lihaoyi changed the title ~~Add API for spawning sub-tasks, use for grouping and parallelization of test targets within a single module~~ Add API for spawning sub-tasks, use it for grouping and parallelization of test targets within a single module Sep 26, 2024

lihaoyi commented Sep 26, 2024

View reviewed changes

lihaoyi requested review from lefou and lolgab September 26, 2024 13:36

lihaoyi changed the title ~~Add API for spawning sub-tasks, use it for grouping and parallelization of test targets within a single module~~ Add API for spawning task-futures, use it for grouping and parallelization of test targets within a single module Sep 26, 2024

lihaoyi changed the title ~~Add API for spawning task-futures, use it for grouping and parallelization of test targets within a single module~~ Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module Sep 26, 2024

lihaoyi added 2 commits September 27, 2024 09:00

.

a3afc0c

.

866d08f

lihaoyi mentioned this pull request Sep 28, 2024

build.mill files compiled by Scala 3 #3369

Open

20 tasks

lihaoyi added 10 commits September 28, 2024 22:03

Merge branch 'main' into test-par

748c71f

.

10c29c3

.

4560377

.

2a4233f

.

56508c9

.

6d1bbc0

merge

5bbf7ba

.

e2c5422

.

0a9865c

.

4c30f76

lihaoyi merged commit 05bef7e into com-lihaoyi:main Sep 30, 2024
24 checks passed

lefou added this to the 0.12.0-RC3 milestone Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

lihaoyi commented Sep 7, 2024 •

edited

Loading

lihaoyi Sep 26, 2024

lolgab commented Sep 26, 2024

lihaoyi commented Sep 27, 2024

Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

Add API for spawning task-futures, use it for grouping and parallelization of test classes within a single module #3478

Conversation

lihaoyi commented Sep 7, 2024 • edited Loading

Implementation Notes

lihaoyi Sep 26, 2024

Choose a reason for hiding this comment

lolgab commented Sep 26, 2024

lihaoyi commented Sep 27, 2024

lihaoyi commented Sep 7, 2024 •

edited

Loading