Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to make the maven plugin faster? #927

Closed
seguri opened this issue Sep 2, 2021 · 21 comments
Closed

Is it possible to make the maven plugin faster? #927

seguri opened this issue Sep 2, 2021 · 21 comments

Comments

@seguri
Copy link

seguri commented Sep 2, 2021

Hi everyone,

The plugin normally takes around 12s on my machine to apply:

Benchmark #1: mvn --offline --quiet spotless:apply
  Time (mean ± σ):     12.635 s ±  0.656 s    [User: 18.042 s, System: 1.202 s]
  Range (min … max):   11.782 s … 13.716 s    10 runs

Is there a way to make it faster? I use spotless commands in git hooks and it really slows me down.

Thanks!

macOS 11.5.2
Java version: 11.0.12, vendor: Amazon.com Inc.
Apache Maven 3.6.3
spotless-maven-plugin:1.30.0

@nedtwigg
Copy link
Member

nedtwigg commented Sep 2, 2021

I know of two paths to make spotless-maven faster:

  1. more parallelism
  2. incremental up-to-date checking (only format files which have changed since the last run)

For both of these paths, Gradle has built-in mechanisms which we use. I don't know much about maven, maybe there are built-in mechanisms which we could use but aren't yet. But even if there aren't, we could build them ourselves. Happy to merge any performance-enhancing PRs, just be sure to include some test data in the PR which quantify the improvement.

The only thing you can do right now out of the box is to use ratchetFrom. That will drastically limit the amount of formatting to do, which should speed things up a lot. But even with ratchetFrom, it would be much faster still if we also had incremental up-to-date checking with the maven plugin.

I'll ping @lutovich in case he has more to add.

@lutovich
Copy link
Contributor

lutovich commented Sep 3, 2021

I agree with both points.

Maven supports parallel builds. Using them can give a good performance boost for multi-module projects. You could try mvn -T 1C --offline --quiet spotless:apply and see if it improves performance. Docs for parallel builds: link.

I'm not aware of any built-in feature in Maven that could help with incremental up-to-date checking. Spotless Maven plugin could perhaps maintain its own index of file -> checksum or file -> last modified date. It could use such an index to reduce the number of formatted files.

It could also be that some formatting step is just slow because the code is not optimal. It would be helpful to know the plugin configuration and what formatters are turned on. @seguri could you please share your plugin config, and maybe your repository, if it is open source? Alternatively, a Java Flight Recording would be very helpful to understand what are the potential bottlenecks.

@seguri
Copy link
Author

seguri commented Sep 3, 2021

@lutovich Unfortunately -T4, -T8, -T1C do not produce any benefit...
Spotless also takes the same amount of time when formatting an already formatted repository.
I'll try to share more information on Monday.

@seguri
Copy link
Author

seguri commented Sep 9, 2021

Here's our plugin configuration:

<plugin>
  <groupId>com.diffplug.spotless</groupId>
  <artifactId>spotless-maven-plugin</artifactId>
  <version>1.30.0</version>
  <executions>
    <execution>
      <id>spotless</id>
      <phase>validate</phase>
      <goals>
        <goal>check</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <java>
      <googleJavaFormat>
        <version>1.7</version>
        <style>GOOGLE</style>
      </googleJavaFormat>
    </java>
  </configuration>
</plugin>

I can't share the repository as it belongs to my company.

@nedtwigg
Copy link
Member

Just want to note that there is a WIP for this at #935.

@seguri
Copy link
Author

seguri commented Sep 23, 2021

Hi,
I just want to mention the great improvement I had after upgrading to Java 17:

Benchmark #1: mvn --offline --quiet spotless:apply
  Time (mean ± σ):      3.850 s ±  0.057 s    [User: 12.751 s, System: 0.734 s]
  Range (min … max):    3.764 s …  3.945 s    10 runs

@nedtwigg
Copy link
Member

@lutovich made a huge contribution in plugin-maven 2.18.0 to add incremental up-to-date to the maven plugin. See here for docs on how to enable it.

@lutovich
Copy link
Contributor

lutovich commented Dec 25, 2021

Hi @seguri! It would be great if you could try the new version with up-to-date checking turned on and let us know if performance has improved!

Based on the config provided above, the updated config would be:

<plugin>
  <groupId>com.diffplug.spotless</groupId>
  <artifactId>spotless-maven-plugin</artifactId>
  <version>1.30.0</version>
  <executions>
    <execution>
      <id>spotless</id>
      <phase>validate</phase>
      <goals>
        <goal>check</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <upToDateChecking>
      <enabled>true</enabled>
    </upToDateChecking>
    <java>
      <googleJavaFormat>
        <version>1.7</version>
        <style>GOOGLE</style>
      </googleJavaFormat>
    </java>
  </configuration>
</plugin>

@seguri
Copy link
Author

seguri commented Dec 25, 2021

Using Java 17.
Before:

Benchmark 1: mvn --offline --quiet spotless:apply
  Time (mean ± σ):      2.513 s ±  0.138 s    [User: 5.769 s, System: 0.367 s]
  Range (min … max):    2.423 s …  2.891 s    10 runs

After:

Benchmark 1: mvn --offline --quiet spotless:apply
  Time (mean ± σ):      1.100 s ±  0.014 s    [User: 2.260 s, System: 0.199 s]
  Range (min … max):    1.086 s …  1.128 s    10 runs

So, more than halved, good job :)

@aaime
Copy link

aaime commented Jan 15, 2022

I've been recently considering a switch from fmt-maven-plugin to spotless, on a large multi-module codebase (120 modules) where we normally run parallel builds (and often a "clean" one). We run formatting as part of the normal build, so it's important that it happens fast.

Turns out that doing the same formatting in spotless takes over 24 seconds (mvn spotless:apply -T6 -nsu -fae -Dall) whilst fmt-maven-plugin can do the same in around 14 seconds (mvn fmt:format -T6 -nsu -fae -Dall). This is on a old 8 core machine, the optimal parallelization level will depend on your hardware.

I've also tried running the formatting on the largest of the modules (modules/library/main), alone, spotless takes 8.8 seconds whilst fmt-maven-plugin takes 4.3.

Long story short, there seems to be some fundamental performance difference between the two plugins.

Also tried switching to Java 17, does not make a difference on this build.

@seguri
Copy link
Author

seguri commented Jan 15, 2022

@aaime which version of spotless are you running and with which configuration?

@nedtwigg
Copy link
Member

@aaime I bet you have not turned on up-to-date checking <upToDateChecking><enabled>true</enabled></upToDateChecking>. That should lower your runtime dramatically.

Spotless makes an idempotence guarantee, so I would expect Spotless to take ~2x as long as a formatter plugin which doesn't make an idempotence guarantee, which matches your reporting. Turning on up-to-date checking should bring the spotless time down to well under a second. We won't make idempotence checking optional, because it's quite common for formatters to have idempotence problems, and it sinks the whole point of autoformatting in the first place.

@aaime
Copy link

aaime commented Jan 17, 2022

@nedtwigg I see what you mean, idempotence checks defend against badly written formatting steps and make Spotless process each file twice. google-java-format is the only bit that we need to run, and we know it's not broken, hence the check is not useful (for us).

upToDateChecking seems to store the information in the build tree itself, right? As hinted above, the devs often do clean builds (to avoid stale classes or issues with IDE compilers not matching the maven command line one), which would wipe out that information.

Looks like there is a bit of workflow and culture clash here. For the time being, I'll table a switch to Spotless, but who knows , we might be back in the future. Thanks anyways for sharing your work as open source and for the quick follow up!

@lutovich
Copy link
Contributor

Hi @aaime, #1055 allows customizing the location of the index file used for upToDateChecking. By default, this index is stored in the target/ dir and removed with mvn clean. You could configure the plugin to store the index in the module root directory instead:

<upToDateChecking>
  <enabled>true</enabled>
  <indexFile>${project.base.dir}/spotless-index</indexFile>
</upToDateChecking>

and add spotless-index to .gitignore. This configuration will make sure mvn clean does not remove the upToDateChecking index file.

@aaime
Copy link

aaime commented Jan 17, 2022

Interesting, that would give me some leg to argue for a switch. Are these properties documented anywhere? Also, is it included in a release?

@lutovich
Copy link
Contributor

Documentation is here and this feature is available in the latest 2.20.0 release.

@nedtwigg
Copy link
Member

google-java-format is the only bit that we need to run, and we know it's not broken

It is broken right now and has been similarly broken frequently throughout its history.

Without idempotence-fixing at the plugin level, these glitches occur all the time, but they appear like user error. A dev blows a few minutes saying "I thought for sure I ran the script, huh" and shrugs it off.

@aaime
Copy link

aaime commented Jan 18, 2022

Duh, we've been using google-java-format on GeoTools and two other large projects for almost 4 years without noticing :-D

@diffplug diffplug deleted a comment from seguri Jan 18, 2022
@diffplug diffplug deleted a comment from aaime Jan 18, 2022
@ambition-consulting
Copy link

Hey guys, I'm struggling with parallel formatting on a maven multi-module build. As far as I can tell, spotless-maven-plugin respects reactor dependencies, e.g. if a -> b -> c, then it will run them sequentially, even though formatting itself won't depend on such build-time dependencies.

Can we make the maven spotless plugin ignore reactor dependencies for parallel efficiency?

@nedtwigg
Copy link
Member

I don't know, but there can be issues with generated code. e.g. if a generates code for b, and then spotless adds license headers to that generated code, then the reactor dependency is real.

@ambition-consulting
Copy link

So, legals is killing productivity again? :-( I never had any dependency on formatting, but understand if it's too much work dealing with such stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants