Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Parallel formatting for Maven plugin #1123

Closed
wants to merge 1 commit into from

Conversation

lutovich
Copy link
Contributor

@lutovich lutovich commented Feb 9, 2022

POC to gather feedback about parallelizing execution of Formatters in the Maven (and perhaps Gradle) plugin.

I tested this change on https://github.com/neo4j/neo4j codebase with the following simple configuration:

<plugin>
  <groupId>com.diffplug.spotless</groupId>
  <artifactId>spotless-maven-plugin</artifactId>
  <version>2.20.2-SNAPSHOT</version>
  <configuration>
    <java>
      <googleJavaFormat/>
    </java>
  </configuration>
</plugin>

On my Mac with an SSD drive, command mvn spotless:apply takes:

  • ~54 seconds without parallelization
  • ~23 seconds with parallelization

The speedup looks promising, but it will likely vary significantly depending on the configured formatters and hard drive.

Implementation notes:

  • One small thread pool for reading files to limit loading the disk with random reads
  • One small thread pool for writing formatted files to limit loading the disk with random writes
  • One large thread pool for executing formatters because formatters are probably CPU/memory-bound
  • Daemon threads for all thread pools because Maven plugins do not seem to have lifecycle hooks or #close() method where to shutdown the executors
  • Only the main thread accesses UpToDateChecker because it is not thread-safe (for now)
  • FormattingParallelizer is a singleton to make sure the plugin does not create too many thread pools. E.g. avoid creating 3 pools for each Maven module
  • Thread pool sizes are hardcoded. They should be configurable

Perhaps the logic to read-format-write can be shared between Maven and Gradle plugins. Then, both would benefit from such parallelization.

What do you think about parallelizing formatters? Are formatter instances typically thread-safe?

Any input is much appreciated :)

@lutovich lutovich requested review from nedtwigg and fvgh February 9, 2022 23:29
@nedtwigg
Copy link
Member

nedtwigg commented Feb 9, 2022

Here is one fixable problem for thread-safety:

if (formatter == null) {
formatter = stateToFormatter.apply(state());
}

The other possible problem is inside the formatters themselves.

IMO, the Spotless workflow is:

  1. First run is expensive
  2. Every subsequent run is cheap because of incremental

That has always been true on Gradle, and now it is also true on Maven. IMO, parallel formatting as a first-class citizen is a bad tradeoff, because it adds complexity everywhere but the only place it brings real value is in the first run, which is not where developers spend most of their time.

I'd be fine with adding a parallelDangerousMode = true or something like that, but I'm very hesitant to make multithreading a first-class citizen.

@lutovich
Copy link
Contributor Author

I agree with your reasoning. Parallel formatting could significantly increase code complexity. It would only improve the first run and clean runs (e.g. mvn clean package). Thanks to #1055, there's a way to preserve Maven plugin's incremental state even across clean runs.

I'll close this experiment. Thanks for your input!

@lutovich lutovich closed this Feb 10, 2022
@lutovich lutovich deleted the parallel-formatting branch February 10, 2022 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants