Skip to content

dsyer/spring-boot-startup-bench

Repository files navigation

Benchmarks for Spring Boot Start Up

TL;DR: if you do nothing else, then make sure you set -XX:TieredStopAtLevel=1 and -noverify as default JVM args in your IDE. They might not be recommended in a long-running, production process, but they have a really big effect on startup time. Get a fast disk (SSD). Also, don’t starve your app of CPU (you need 3 or 4 CPUs to take advantage of some of the parallelization in Spring Boot, 8 is better). And use an up-to-date JVM (although actually 8u131-zulu seems to to be faster than 8u141-openjdk).

Spring and Spring Boot are not slow (after many optimizations, some of which were motivated by data here). Neither component scanning, autoconfiguration, condition processing, nor anything else they do on their own is a significant source of startup time. There is some evidence that most of the cost is associated with JVM overheads, loading and parsing classes. So essentially, if you want more features (more classes) you get slower startup. See in particular the data here for more detail.

Note
if you are looking for more information on memory consumption, as opposed to startup time, please see https://github.com/dsyer/spring-boot-memory-blog.

This project uses JMH to test and report on various the effect of various parameters on Spring Boot startup times. The apps are launched in a forked JVM and the measurements below are all "seconds to start up", which includes the time from the app being executed to the point where it reports "Started" on the console.

Various combinations of things are tested:

  • Spring Boot version, 1.5.1, 1.4.2 and 1.3.8

  • Some JVM flags (e.g. -noverify has an effect -Xmx not so much)

  • Fat jar vs thin jar

  • Exploding the archive and running with JarLauncher or the application main

  • Shading vs Spring Boot fat jars

  • Different application features: a basic web app with actuator, the petclinic and a Spring Cloud config server

  • Hardware (running on bare metal vs AWS for instance)

  • The scaling of startup time with bean count is investigated in more detail in the static module

Note
There is a Spring Boot issue with some comments and more benchmark results. This might be useful to read if you have slow start up times and want some company. Very important: check your anti-virus software and make sure it isn’t slowing down access to jar files.

Spring Boot (and other) issues for follow up:

Running Locally

You need Java 8 and Docker to build all the code cleanly (you can build selectively the bits that don’t depend on Docker if you don’t have it). To run the JsaBenchmark you need an Oracle JVM.

Then do this:

$ ./mvnw clean install
$ (cd benchmarks/; java -jar target/benchmarks.jar)

If you see odd (longer than expected) results in the devtoolsRestart benchmarks, check that you don’t have a ⁠⁠⁠⁠~/.spring-boot-devtools.properties overriding the polling properties.

Running in EC2

Install vagrant and the AWS plugin (https://github.com/mitchellh/vagrant-aws). Then make sure you have an EC2 security group called "vagrant" that allows ssh (open it up for all traffic to be on the safe side). The way the Vagrantfile is configured you need a default VPC in us-east-2 (an account that was created recently will have one by default).

To provide your AWS credentials, define some environment variables:

$ export AWS_ACCESS_KEY=...
$ export AWS_SECRET_KEY=...
$ export AWS_KEYPAIR=...

where AWS_KEYPAIR is the AWS name of the key pair that you have already registered with Amazon. For the configuration to work without changes you also need the private key in a file with read-only (600) permissions called ~/.ssh/id_rsa.${AWS_KEYPAIR}. Then you can

$ ./mvnw clean install
$ cd vagrant
$ vagrant up --provider aws
$ vagrant ssh
> cd /build/benchmarks
> java -jar target/benchmarks.jar

Edit the Vagrantfile to change the instance type and repeat the process. You have to vagrant destroy in between changes of instance type.

If you need to update the benchmarks just mvnw clean install locally and then vagrant reload to sync the changes to the remote box.

Note
I had a lot of issues getting SSH to work. Had to create my own security group, and there was a hiccup with the ssh client wanting to prompt me to change my ~/.ssh/known_hosts. Good luck.

The Minimal Sample

The basic webapp also has a "minimal" version with a stripped down classpath and handpicked configuration classes (instead of @EnableAutoConfiguration). I noticed that logback showed up a lot in the -verbose:class logs so I took it out of the minimal sample. Ditto hibernate-validator (lots of classes initialized compared to the actual feature set I use).

I also tried a super-minimal sample, where I included, instead of a handpicked set of autoconfiguration, the whole set of configuration from the vanilla demo. That’s a bit slower (maybe 150ms) to start up, but not much in it. Also I copied the config files and manually stripped out the @Conditional annotations, so there is no condition processing. This made no appreciable difference (with Spring Boot 1.5.4, although I suspect it might have done with 1.3.8).

Benchmark Results

Summary:

  • It is possible to start a non-trivial app from cold in well under 2 seconds (well under 1 second in a warm JVM).

  • Spring Boot 1.4 and 1.3 are barely distinguishable (1.4 slightly slower but within the error bars). 1.5 is fast too. 2.0 a bit slower, but not much.

  • Devtools restart is much faster than a cold start.

  • Running the app’s main method from an exploded jar is marginally faster.

  • Hardware matters a lot, but affects all apps roughly the same.

  • Spring Boot (and Spring Data) slow things down (but only a bit).

  • Common Data Sharing (CDS) can improve startup a bit (see JsaBenchmark).

N.B. the times reported by the Spring Boot stop watch on the console are always shorter than those measured by the benchmark. The difference is a feature of the hardware used (200ms on faster platforms, 1700ms on slowest) and also the test ("thin" jars have a wider margin because they do a little bit of work checking caches before Spring Boot actually starts).

Spring Boot 2.x

Spring Boot is 2.0 is a bit slower and 2.1 is a bit faster. Early milestones were fine, but degradation continued through the 2.0 development cycle. A lot of progress was made between M7 and RC1, so I suppose it could be worse. Results (before RC2):

Benchmark                                        Mode  Cnt  Score   Error  Units
SpringBoot13xBenchmark.explodedJarLauncher       avgt   10  2.103 ± 0.260   s/op
SpringBoot13xBenchmark.explodedJarMain           avgt   10  1.447 ± 0.038   s/op
SpringBoot13xBenchmark.fatJar                    avgt   10  1.959 ± 0.081   s/op
SpringBoot14xBenchmark.explodedJarLauncher       avgt   10  2.155 ± 0.080   s/op
SpringBoot14xBenchmark.explodedJarMain           avgt   10  1.659 ± 0.053   s/op
SpringBoot14xBenchmark.fatJar                    avgt   10  2.097 ± 0.096   s/op
SpringBoot15xBenchmark.explodedJarLauncher       avgt   10  1.964 ± 0.060   s/op
SpringBoot15xBenchmark.explodedJarMain           avgt   10  1.495 ± 0.045   s/op
SpringBoot15xBenchmark.fatJar                    avgt   10  1.852 ± 0.055   s/op
SpringBoot20xBenchmark.explodedJarLauncher       avgt   10  2.371 ± 0.107   s/op
SpringBoot20xBenchmark.explodedJarMain           avgt   10  1.815 ± 0.053   s/op
SpringBoot20xBenchmark.fatJar                    avgt   10  2.248 ± 0.071   s/op

Spring Boot 2.0.4.RELEASE:

Benchmark                                 Mode  Cnt  Score   Error  Units
PetclinicLatestBenchmark.devtoolsRestart  avgt   10  1.210 ± 0.034   s/op
PetclinicLatestBenchmark.explodedJarMain  avgt   10  4.110 ± 0.092   s/op
PetclinicLatestBenchmark.fatJar           avgt   10  5.599 ± 0.057   s/op
PetclinicLatestBenchmark.noverify         avgt   10  4.892 ± 0.152   s/op

Spring Boot 2.1.0 (snaphots after M1):

Benchmark                                 Mode  Cnt  Score   Error  Units
PetclinicLatestBenchmark.devtoolsRestart  avgt   10  1.208 ± 0.070   s/op
PetclinicLatestBenchmark.explodedJarMain  avgt   10  3.897 ± 0.067   s/op
PetclinicLatestBenchmark.fatJar           avgt   10  4.996 ± 0.032   s/op
PetclinicLatestBenchmark.noverify         avgt   10  4.399 ± 0.029   s/op
PetclinicLatestBenchmark.explodedJarFlags avgt   10  3.325 ± 0.053   s/op

With the new Spring Data LAZY repositories feature (and JPA):

Benchmark                                  Mode  Cnt  Score   Error  Units
PetclinicLatestBenchmark.explodedJarMain   avgt   10  3.751 ± 0.021   s/op
PetclinicLatestBenchmark.fatJar            avgt   10  4.771 ± 0.039   s/op
PetclinicLatestBenchmark.noverify          avgt   10  4.063 ± 0.051   s/op
PetclinicLatestBenchmark.explodedJarFlags  avgt   10  3.095 ± 0.051   s/op

Spring Boot 1.5

There are a few improvements in Spring 1.5 that affect startup time (mainly to do with configuration processing). Here are some highlighted results, where "latest" means Pet Clinic with Boot 1.5:

Benchmark                                   Mode  Cnt  Score   Error  Units
ConfigServerBenchmark.fatJar138             avgt   10  2.959 ± 0.082   s/op
ConfigServerBenchmark.fatJar142             avgt   10  2.786 ± 0.078   s/op
ConfigServerBenchmark.fatJar150             avgt   10  2.668 ± 0.084   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10  2.202 ± 0.111   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10  1.516 ± 0.087   s/op
SpringBoot138Benchmark.fatJar               avgt   10  2.024 ± 0.068   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10  2.210 ± 0.061   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10  1.649 ± 0.089   s/op
SpringBoot142Benchmark.fatJar               avgt   10  2.138 ± 0.081   s/op
SpringBoot150Benchmark.explodedJarLauncher  avgt   10  2.210 ± 0.078   s/op
SpringBoot150Benchmark.explodedJarMain      avgt   10  1.664 ± 0.085   s/op
SpringBoot150Benchmark.fatJar               avgt   10  2.128 ± 0.112   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10  1.116 ± 0.070   s/op
PetclinicBenchmark.explodedJarMain          avgt   10  3.469 ± 0.069   s/op
PetclinicBenchmark.fatJar                   avgt   10  5.003 ± 0.350   s/op
PetclinicBenchmark.noverify                 avgt   10  4.358 ± 0.113   s/op
PetclinicLatestBenchmark.devtoolsRestart    avgt   10  1.056 ± 0.067   s/op
PetclinicLatestBenchmark.explodedJarMain    avgt   10  3.441 ± 0.106   s/op
PetclinicLatestBenchmark.fatJar             avgt   10  4.787 ± 0.112   s/op
PetclinicLatestBenchmark.noverify           avgt   10  4.289 ± 0.070   s/op
MinimalBenchmark.explodedJarMain            avgt   10  1.250 ± 0.029   s/op
MinimalBenchmark.fatJar                     avgt   10  1.579 ± 0.072   s/op

The configserver benefits by a few hundred millseconds. The vanilla demo (web plus actuator) interestingly does not. The Pet Clinic also gets a couple of hundred milliseconds boost, but not so much in the exploded jar case (which mimic what people get in the IDE). The minimal sample is quite a bit zippier.

Host: tower, i7, 3.4GHz, 32G RAM, SSD

New Dell desktop

 Benchmark                                   Mode  Cnt  Score   Error  Units
 ConfigServerBenchmark.devtoolsRestart       avgt   10  1.034 ± 0.201   s/op
 ConfigServerBenchmark.explodedJarMain       avgt   10  3.271 ± 0.234   s/op
 ConfigServerBenchmark.fatJar138             avgt   10  3.267 ± 0.173   s/op
 ConfigServerBenchmark.fatJar142             avgt   10  3.423 ± 0.160   s/op
 PetclinicBenchmark.devtoolsRestart          avgt   10  1.387 ± 0.399   s/op
 PetclinicBenchmark.explodedJarMain          avgt   10  5.491 ± 0.357   s/op
 PetclinicBenchmark.explodedShadedMain       avgt   10  5.266 ± 0.332   s/op
 PetclinicBenchmark.fatJar                   avgt   10  6.130 ± 0.271   s/op
 PetclinicBenchmark.noverify                 avgt   10  5.430 ± 0.217   s/op
 PetclinicBenchmark.shaded                   avgt   10  7.975 ± 0.307   s/op
 SpringBoot138Benchmark.explodedJarLauncher  avgt   10  2.764 ± 0.149   s/op
 SpringBoot138Benchmark.explodedJarMain      avgt   10  2.004 ± 0.074   s/op
 SpringBoot138Benchmark.fatJar               avgt   10  2.589 ± 0.107   s/op
 SpringBoot142Benchmark.explodedJarLauncher  avgt   10  2.873 ± 0.105   s/op
 SpringBoot142Benchmark.explodedJarMain      avgt   10  2.229 ± 0.083   s/op
 SpringBoot142Benchmark.fatJar               avgt   10  2.677 ± 0.209   s/op
 SpringBootThinBenchmark.basic138Thin        avgt   10  2.474 ± 0.134   s/op
 SpringBootThinBenchmark.basic142Thin        avgt   10  2.764 ± 0.160   s/op
 MinimalBenchmark.explodedJarMain            avgt   10  1.602 ± 0.120   s/op
 MinimalBenchmark.fatJar                     avgt   10  1.901 ± 0.089   s/op
 JsaBenchmark.basicThin                      avgt   10  5.725 ± 0.460   s/op
 JsaBenchmark.sharedClasses                  avgt   10  4.302 ± 0.203   s/op
 JsaBenchmark.thinMain                       avgt   10  4.808 ± 0.161   s/op

Unused Thymeleaf Features

Excluding groovy and thymeleaf layout (unused dependencies) from the Pet Clinic had a big effect on the startup time, but oddly only in the exploded main test:

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-thymeleaf</artifactId>
	<exclusions>
		<exclusion>
			<groupId>org.codehaus.groovy</groupId>
			<artifactId>groovy</artifactId>
		</exclusion>
		<exclusion>
			<groupId>nz.net.ultraq.thymeleaf</groupId>
			<artifactId>thymeleaf-layout-dialect</artifactId>
		</exclusion>
	</exclusions>
</dependency>

Before:

PetclinicBenchmark.explodedJarMain  avgt   10  5.491 ± 0.357   s/op

and after:

PetclinicBenchmark.explodedJarMain  avgt   10  4.894 ± 0.186   s/op

More JVM Options

Setting -XX:TieredStopAtLevel=1 was quite a big boost, so we set that permanently in the benchmarks (the rest of the results pre-date this change):

Benchmark                                   Mode  Cnt  Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10  0.842 ± 0.047   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10  2.451 ± 0.100   s/op
ConfigServerBenchmark.fatJar138             avgt   10  3.002 ± 0.091   s/op
ConfigServerBenchmark.fatJar142             avgt   10  2.936 ± 0.101   s/op
MinimalBenchmark.explodedJarMain            avgt   10  1.268 ± 0.077   s/op
MinimalBenchmark.fatJar                     avgt   10  1.537 ± 0.078   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10  1.107 ± 0.054   s/op
PetclinicBenchmark.explodedJarMain          avgt   10  3.618 ± 0.149   s/op
PetclinicBenchmark.fatJar                   avgt   10  5.040 ± 0.229   s/op
PetclinicBenchmark.noverify                 avgt   10  4.402 ± 0.149   s/op
ShadedBenchmark.explodedShadedMain          avgt   10  3.764 ± 0.085   s/op
ShadedBenchmark.shaded                      avgt   10  5.871 ± 0.123   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10  2.224 ± 0.083   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10  1.542 ± 0.044   s/op
SpringBoot138Benchmark.fatJar               avgt   10  2.102 ± 0.126   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10  2.232 ± 0.067   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10  1.684 ± 0.087   s/op
SpringBoot142Benchmark.fatJar               avgt   10  2.146 ± 0.080   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10  1.908 ± 0.120   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10  2.081 ± 0.088   s/op
SpringBootThinBenchmark.petclinicThin       avgt   10  4.038 ± 0.267   s/op
JsaBenchmark.basicThin                      avgt   10  4.230 ± 0.216   s/op
JsaBenchmark.sharedClasses                  avgt   10  3.124 ± 0.125   s/op
JsaBenchmark.thinMain                       avgt   10  3.573 ± 0.229   s/op

This wouldn’t be a sensible setting for a long-running process in production, but it seems to have a positive effect on startup, so at dev time it’s a good idea.

Class Data Sharing (CDS)

Class Data Sharing is a JDK feature (since Java 5). Some people have reported success in improving startup time with simple apps (e.g. see this blog).

CDS is not a "standard" JVM option, but it works for core JDK classes with OpenJDK. You have to use the Oracle JDK for the JsaBenchmark which also uses application class data sharing. IBM and Open J9 have their own version of the same thing. Zulu works with the vanilla CdsBenchmark.

There are data for a JsaBenchmark that uses commercial features in the Oracle JDK, and also for a "vanilla" CdsBenchmark (not using commercial features).

With 8u151-oracle:

Benchmark                   Mode  Cnt  Score   Error  Units
CdsBenchmark.sharedClasses  avgt   10  2.871 ± 0.088   s/op
CdsBenchmark.thinMain       avgt   10  2.933 ± 0.154   s/op
JsaBenchmark.sharedClasses  avgt   10  2.893 ± 0.068   s/op
JsaBenchmark.thinMain       avgt   10  2.898 ± 0.085   s/op

With 8u152-zulu:

Benchmark                   Mode  Cnt  Score   Error  Units
CdsBenchmark.sharedClasses  avgt   10  2.854 ± 0.042   s/op
CdsBenchmark.thinMain       avgt   10  2.837 ± 0.072   s/op

So there is really nothing in it (which is what I remembered from the JDK9 results I looked at as well). The difference in the old results is probably entirely explained by the effect of -noverify, which I forgot to include in the non-shared classes execution.

This is a PetClinic with Boot 1.4.6. There’s no reason to put any effort into exploring the effect with other versions and other apps based on this data.

Java 10 has app class sharing as a new (non-commercial) feature. It is slower in general than Java 8, but the class sharing does improve startup times:

Benchmark                         (sample)  Mode  Cnt  Score   Error  Units
Java10CdsBenchmark.sharedClasses     older  avgt   10  2.780 ± 0.071   s/op
Java10CdsBenchmark.sharedClasses    latest  avgt   10  3.407 ± 0.223   s/op
Java10CdsBenchmark.thinMain          older  avgt   10  3.217 ± 0.783   s/op
Java10CdsBenchmark.thinMain         latest  avgt   10  3.747 ± 1.026   s/op

The "older" is Boot 1.5, and "latest" is 2.0, so the effect is more dramatic for Boot 1.5. Things are a bit better with Java 11 (apparently there was a bug in Java 10 that made it slower on startup overall). See the static module for more data.

Logback

By switching between Logback and other libraries in the minimal sample we can test the effect of the logging library. We did this by removing the logging starter completely, and adding in commons-logging (which drops down to plain JDK logging by default). The results show that logback slows down the fat jar compared to plain JDK logging, but not the exploded main. The effect is only 100ms. Here are the data (with all the rest of the test set up as per the previous section):

MinimalBenchmark.explodedJarMain            avgt   10  1.292 ± 0.105   s/op
MinimalBenchmark.fatJar                     avgt   10  1.620 ± 0.082   s/op

Log4j2 is actually slower:

MinimalBenchmark.explodedJarMain            avgt   10  1.425 ± 0.098   s/op
MinimalBenchmark.fatJar                     avgt   10  1.820 ± 0.069   s/op

Remember there are only about 40 lines of logging in these tests. It might well be that logback or log4j2 performs much better under load, and this test is just for initialization time really.

EhCache

Stripping everything else back reveals EhCache initialization as a hotspot in the Pet Clinic (it shows up in YourKit when you launch in "trace" mode). Removing it from the classpath boosts startup by a couple of hundred milliseconds:

PetclinicBenchmark.devtoolsRestart  avgt   10  1.095 ± 0.086   s/op
PetclinicBenchmark.explodedJarMain  avgt   10  3.465 ± 0.145   s/op
PetclinicBenchmark.fatJar           avgt   10  4.827 ± 0.241   s/op
PetclinicBenchmark.noverify         avgt   10  4.249 ± 0.116   s/op

If you add -noverify as well, you will see the startup time in an exploded jar (so in the IDE) at a little more than 3s (down from over 6s initially). Note that EhCache is not used by the app unless the "production" profile is enabled, so you can switch it off just by disabling the profile. The benchmarks allow you to do that by setting -Dbench.args, e.g:

$ cd benchmarks/
$ java -Dbench.args='-Dspring.profiles.active=default' -Ddebug \
  -jar target/benchmarks.jar PetclinicBenchmark

Actuator

Removing the actuator actually changes the features of the application, so in some ways it’s an unfair test, but on the other hand, if you only want to test the business features you might not care if the actuator was not there on startup. The results are promising (cumulative on top of disabling ehcache):

PetclinicBenchmark.devtoolsRestart  avgt   10  0.781 ± 0.056   s/op
PetclinicBenchmark.explodedJarMain  avgt   10  2.845 ± 0.089   s/op
PetclinicBenchmark.fatJar           avgt   10  4.083 ± 0.192   s/op
PetclinicBenchmark.noverify         avgt   10  3.566 ± 0.146   s/op

Hibernate Validator

What if we ditch the validator (also shows up as a hotspot in YourKit)? Again this is cutting into application features - you even have to edit the Pet Clinic quite heavily to get it to work without validation. Results are good though, so maybe we could make it lazy somehow:

Benchmark                           Mode  Cnt  Score   Error  Units
PetclinicBenchmark.devtoolsRestart  avgt   10  0.724 ± 0.047   s/op
PetclinicBenchmark.explodedJarMain  avgt   10  2.670 ± 0.136   s/op
PetclinicBenchmark.fatJar           avgt   10  3.682 ± 0.097   s/op
PetclinicBenchmark.noverify         avgt   10  3.213 ± 0.146   s/op

Thymeleaf and "Unused Beans"

Initializing a Jackson ObjectMapper is the next hotspot after EhCache is disabled. It turns out that this is a side effect of initializing Thymeleaf. It happens in a class initializer in SpringTemplateEngine, so there’s no way to switch it off with lazy beans in Spring, other than to remove Jackson from the classpath.

Even if we removed Thymeleaf, and replaced it with Mustache for instance, Jackson is needed by the actuator. So we would pay the startup cost anyway, despite the fact that it isn’t used. This raises the question of why we need to initialize these things at all. They aren’t going to be used until a request is processed.

Here’s the result of replacing Thymeleaf with Mustache (on top of all the other improvements above) and excluding Jackson:

Benchmark                           Mode  Cnt  Score   Error  Units
PetclinicBenchmark.devtoolsRestart  avgt   10  0.727 ± 0.052   s/op
PetclinicBenchmark.explodedJarMain  avgt   10  2.577 ± 0.091   s/op
PetclinicBenchmark.fatJar           avgt   10  3.514 ± 0.107   s/op
PetclinicBenchmark.noverify         avgt   10  3.046 ± 0.142   s/op

There is a small improvement, but it’s not clear that it has precisely the origin we attribute using the reasoning above. The reasoning behind that assertion is that if we do a background initialization of Thymeleaf in an ApplicationInitializer (same pattern as BackgroundPreinitializer from Spring Boot) then there is no improvement in startup time, despite the fact that Jackson and Thymeleaf are already initialized. It is interesting that removing Thymeleaf completely has an impact, but it must be something other than simply the static initializer. Also notable is that we saw an improvement above when we removed Hibernate Validator from the project, but this is also pre-initialized by Spring Boot, so the inefficiency is not in the initialization there either.

A lot of components in a Spring Boot app are in the category of "needed at some point, but maybe not even for all requests, and definitely not at startup". Another one that shows up in YourKit on startup is the Webjars Locator. You only get <100ms improvement in startup time by leaving it out (and it changes the behaviour of the application, so you want it in there at runtime), but you don’t need it until after startup.

Jackson

Removing Jackson means removing the actuators as well, as things stand. But it has quite a dramatic effect (probably Thymeleaf has no impact at all and we are just seeing Jackson in the results above). Results:

Benchmark                           Mode  Cnt  Score   Error  Units
PetclinicBenchmark.devtoolsRestart  avgt   10  0.851 ± 0.018   s/op
PetclinicBenchmark.explodedJarMain  avgt   10  2.823 ± 0.075   s/op
PetclinicBenchmark.fatJar           avgt   10  3.895 ± 0.039   s/op
PetclinicBenchmark.noverify         avgt   10  3.397 ± 0.060   s/op

These should be compared with the same thing but fully loaded (not the latest result in the previous paragraph). Here’s a reminder of what they said:

Benchmark                                   Mode  Cnt  Score   Error  Units
PetclinicBenchmark.devtoolsRestart          avgt   10  1.107 ± 0.054   s/op
PetclinicBenchmark.explodedJarMain          avgt   10  3.618 ± 0.149   s/op
PetclinicBenchmark.fatJar                   avgt   10  5.040 ± 0.229   s/op
PetclinicBenchmark.noverify                 avgt   10  4.402 ± 0.149   s/op

That’s pretty huge!

Spring 5.0 Component Index

Spring 5.0 has an annotation processor that creates an index of components on startup. It is easy to set up (you just put spring-context-indexer on the classpath), but since it only indexes a component scan, which is a pretty small part of a trivial Spring Boot application like the ones we are testing, it has no discernable effect at runtime.

Host: xerus, Core2 (32bit), 2.66GHz, 4G RAM, HDD

>5 year old Dell desktop

Benchmark                                   Mode  Cnt   Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10   2.041 ± 0.173   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10   6.930 ± 0.228   s/op
ConfigServerBenchmark.fatJar138             avgt   10   7.395 ± 0.203   s/op
ConfigServerBenchmark.fatJar142             avgt   10   7.439 ± 0.247   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10   2.868 ± 0.195   s/op
PetclinicBenchmark.fatJar                   avgt   10  13.579 ± 0.222   s/op
PetclinicBenchmark.noverify                 avgt   10  12.186 ± 0.373   s/op
JsaBenchmark.basicThin                      avgt   10  10.830 ± 0.443   s/op
JsaBenchmark.sharedClasses                  avgt   10   8.158 ± 0.217   s/op
JsaBenchmark.thinMain                       avgt   10   9.568 ± 0.488   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10   5.821 ± 0.191   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10   4.365 ± 0.117   s/op
SpringBoot138Benchmark.fatJar               avgt   10   5.550 ± 0.132   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10   6.238 ± 0.225   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10   4.840 ± 0.137   s/op
SpringBoot142Benchmark.fatJar               avgt   10   6.238 ± 0.560   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10   5.248 ± 0.130   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10   5.715 ± 0.115   s/op

Host: carbon, i7, 2.6GHz, 4G RAM, SSD

One year old Lenovo X1 laptop.

Benchmark                                   Mode  Cnt  Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10  1.458 ± 0.256   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10  5.217 ± 0.280   s/op
ConfigServerBenchmark.fatJar138             avgt   10  5.651 ± 0.221   s/op
ConfigServerBenchmark.fatJar142             avgt   10  5.812 ± 0.163   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10  1.952 ± 0.249   s/op
PetclinicBenchmark.fatJar                   avgt   10  9.973 ± 0.637   s/op
PetclinicBenchmark.noverify                 avgt   10  9.312 ± 0.350   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10  4.696 ± 0.242   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10  3.599 ± 0.192   s/op
SpringBoot138Benchmark.fatJar               avgt   10  4.427 ± 0.249   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10  4.926 ± 0.313   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10  3.824 ± 0.150   s/op
SpringBoot142Benchmark.fatJar               avgt   10  4.814 ± 0.199   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10  4.370 ± 0.200   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10  4.796 ± 0.165   s/op

Host: dsyer, i7, 2.7GHz, 4G RAM, SSD

Slightly knackered old Lenovo X220 laptop.

Benchmark                                   Mode  Cnt   Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10   1.568 ± 0.253   s/op
ConfigServerBenchmark.fatJar138             avgt   10   6.267 ± 0.274   s/op
ConfigServerBenchmark.fatJar142             avgt   10   6.426 ± 0.398   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10   5.880 ± 0.326   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10   2.053 ± 0.226   s/op
PetclinicBenchmark.fatJar                   avgt   10  11.241 ± 0.365   s/op
PetclinicBenchmark.noverify                 avgt   10  10.726 ± 0.526   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10   5.526 ± 0.341   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10   4.190 ± 0.201   s/op
SpringBoot138Benchmark.fatJar               avgt   10   5.325 ± 0.206   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10   5.958 ± 0.190   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10   4.589 ± 0.258   s/op
SpringBoot142Benchmark.fatJar               avgt   10   5.638 ± 0.269   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10   5.082 ± 0.307   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10   5.731 ± 0.211   s/op

Host: t2.small, 1CPU, 1G RAM, SSD

Benchmark                                   Mode  Cnt   Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10   1.760 ± 0.236   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10   8.597 ± 0.218   s/op
ConfigServerBenchmark.fatJar138             avgt   10   9.114 ± 0.387   s/op
ConfigServerBenchmark.fatJar142             avgt   10   9.092 ± 0.358   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10   2.709 ± 0.180   s/op
PetclinicBenchmark.fatJar                   avgt   10  15.277 ± 0.414   s/op
PetclinicBenchmark.noverify                 avgt   10  14.011 ± 0.340   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10   7.327 ± 0.223   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10   5.684 ± 0.099   s/op
SpringBoot138Benchmark.fatJar               avgt   10   7.017 ± 0.218   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10   7.825 ± 0.193   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10   6.150 ± 0.160   s/op
SpringBoot142Benchmark.fatJar               avgt   10   7.372 ± 0.204   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10   7.055 ± 0.175   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10   7.553 ± 0.255   s/op

Host: t2.large, 2CPU, 8G RAM, SSD

Benchmark                                   Mode  Cnt  Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10  1.071 ± 0.170   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10  4.504 ± 0.250   s/op
ConfigServerBenchmark.fatJar138             avgt   10  4.706 ± 0.110   s/op
ConfigServerBenchmark.fatJar142             avgt   10  4.730 ± 0.159   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10  1.555 ± 0.198   s/op
PetclinicBenchmark.fatJar                   avgt   10  8.181 ± 0.396   s/op
PetclinicBenchmark.noverify                 avgt   10  7.393 ± 0.304   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10  3.857 ± 0.141   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10  3.113 ± 0.145   s/op
SpringBoot138Benchmark.fatJar               avgt   10  3.729 ± 0.088   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10  4.074 ± 0.170   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10  3.468 ± 0.159   s/op
SpringBoot142Benchmark.fatJar               avgt   10  3.916 ± 0.207   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10  3.935 ± 0.176   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10  4.250 ± 0.208   s/op

Host: m4.large, 2CPU, 8G RAM, SSD

Benchmark                                   Mode  Cnt   Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10   1.638 ± 0.255   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10   7.222 ± 0.153   s/op
ConfigServerBenchmark.fatJar138             avgt   10   7.641 ± 0.212   s/op
ConfigServerBenchmark.fatJar142             avgt   10   7.616 ± 0.182   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10   2.502 ± 0.314   s/op
PetclinicBenchmark.explodedJarMain          avgt   10  11.447 ± 0.209   s/op
PetclinicBenchmark.fatJar                   avgt   10  12.835 ± 0.341   s/op
PetclinicBenchmark.noverify                 avgt   10  11.833 ± 0.438   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10   6.195 ± 0.193   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10   4.939 ± 0.217   s/op
SpringBoot138Benchmark.fatJar               avgt   10   5.893 ± 0.190   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10   6.471 ± 0.126   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10   5.360 ± 0.209   s/op
SpringBoot142Benchmark.fatJar               avgt   10   6.302 ± 0.199   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10   6.013 ± 0.159   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10   6.437 ± 0.301   s/op

Host: r3.large, 2CPU, 15G RAM, SSD

Benchmark                                   Mode  Cnt   Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10   1.874 ± 0.344   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10   7.967 ± 0.270   s/op
ConfigServerBenchmark.fatJar138             avgt   10   8.375 ± 0.229   s/op
ConfigServerBenchmark.fatJar142             avgt   10   8.541 ± 0.317   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10   2.707 ± 0.333   s/op
PetclinicBenchmark.explodedJarMain          avgt   10  12.403 ± 0.385   s/op
PetclinicBenchmark.fatJar                   avgt   10  13.956 ± 0.439   s/op
PetclinicBenchmark.noverify                 avgt   10  13.023 ± 0.382   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10   6.802 ± 0.272   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10   5.370 ± 0.341   s/op
SpringBoot138Benchmark.fatJar               avgt   10   6.609 ± 0.328   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10   7.157 ± 0.233   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10   5.946 ± 0.133   s/op
SpringBoot142Benchmark.fatJar               avgt   10   6.855 ± 0.173   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10   6.527 ± 0.262   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10   7.175 ± 0.286   s/op

Host: aw-rmbp, i7, 2.3GHz, 16GB RAM, SSD

MacBook Pro (Retina, 15-inch, Late 2013) running OS X Yosemite (10.10.5) and Java 1.8.0_102

Benchmark                                   Mode  Cnt  Score   Error  Units
ConfigServerBenchmark.devtoolsRestart       avgt   10  0.999 ± 0.078   s/op
ConfigServerBenchmark.explodedJarMain       avgt   10  3.747 ± 0.177   s/op
ConfigServerBenchmark.fatJar138             avgt   10  3.810 ± 0.158   s/op
ConfigServerBenchmark.fatJar142             avgt   10  3.846 ± 0.049   s/op
PetclinicBenchmark.devtoolsRestart          avgt   10  1.431 ± 0.171   s/op
PetclinicBenchmark.fatJar                   avgt   10  6.995 ± 0.299   s/op
PetclinicBenchmark.noverify                 avgt   10  6.292 ± 0.308   s/op
SpringBoot138Benchmark.explodedJarLauncher  avgt   10  3.256 ± 0.104   s/op
SpringBoot138Benchmark.explodedJarMain      avgt   10  2.518 ± 0.074   s/op
SpringBoot138Benchmark.fatJar               avgt   10  3.012 ± 0.046   s/op
SpringBoot142Benchmark.explodedJarLauncher  avgt   10  3.374 ± 0.082   s/op
SpringBoot142Benchmark.explodedJarMain      avgt   10  2.738 ± 0.040   s/op
SpringBoot142Benchmark.fatJar               avgt   10  3.247 ± 0.109   s/op
SpringBootThinBenchmark.basic138Thin        avgt   10  3.035 ± 0.083   s/op
SpringBootThinBenchmark.basic142Thin        avgt   10  3.271 ± 0.093   s/op

Finer Grained Timing

There’s a tool in the JDK called jcmd that prints out a load of useful stuff about a running process. E.g:

$ jcmd <PID> PerfCounter.print

where <PID> is the process id. The petclinic (on "tower") has these times in ns (highlights):

...
sun.classloader.findClassTime=2051631382
sun.classloader.parentDelegationTime=220795236
sun.cls.appClassLoadTime=2315218900
sun.cls.classInitTime=1058084598
sun.cls.classLinkedTime=1142874583
sun.cls.classVerifyTime=1037315846
sun.cls.defineAppClassTime=946831715
sun.cls.lookupSysClassTime=28253463
sun.cls.parseClassTime=1055187254
sun.cls.sharedClassLoadTime=581133
sun.cls.sysClassLoadTime=223416957
sun.urlClassLoader.readClassBytesTime=299665148
...

So you can immediately save up to a second by using -noverify (probably not advisable in production, but the jury is out on that). The benchmark results confirm this the saving is actually only 700ms, but that’s a start.

Other things that stick out are

  • findClassTime - time spent working out where the damn things are, loading the bytes, parsing the bytes and defining them to build Class instances (2052ms).

  • readClassBytesTime - time from when you have a reference to where the bytes are to the time you have read them in (300ms).

Also interesting (not shown above) is java.cls.loadedClasses, at 10660. Perhaps this is over optimistic but maybe some chunk of them is being touched by accident that aren’t necessary and they can be skipped, or deferred.

Log Analysis