In this module we study the effect of Spring and Spring Boot on startup time by first stripping out as much of it as we can, and then piling on more features, adding more dependencies and letting Spring Boot figure out the desired configuration. In the table below, the "demo" sample is the canonical, empty Spring Boot web app. All the smaller and faster apps are stripped down versions of that where we remove various things and finally get rid of even using reflection in the BeanFactory
(using the new "functional bean registration").
Results:
-
With a couple of outliers we discuss below, the startup time is directly proportional to the number of classes loaded. This usually correlates with the number of beans in the application context, but things like Hibernate and Zuul add more classes than beans.
-
Reflection is not a bottleneck - more work would have to be done with more complex scenarios, but the sample that uses functional bean registration for everything fits the curve of the more heavyweight applications, so it isn’t intrinsically much faster, it just has fewer beans (and therefore fewer features).
-
Condition processing in Spring Boot is not expensive - when we remove conditions partially ("slim", "thin") and completely ("lite", "func") the startup time is right on the curve. The same is true of component scanning, except possibly in the very largest of applications.
-
There is some evidence that most of the cost is associated with JVM overheads, loading and parsing classes (this is supported by other data, for instance on devtools restarts, which are much quicker than cold starts).
Since more beans mean more features, you are paying at startup for actual functionality, so in some ways it should be an acceptable cost. On the other hand, there might be features that you end up never using, or don’t use until later and you would be willing to defer the cost. Spring doesn’t allow you to do that easily. It has some features that might defer the cost of creating beans, but that might not even help if the bean definitions still have to be created and most of the cost is actually to do with loading and parsing classes.
class method sample beans classes heap memory median mean range
MainBenchmark main empt 25.000 3268.000 5.158 39.014 0.555 0.564 0.015
MainBenchmark main flux 95.000 5091.000 7.592 52.693 0.809 0.837 0.028
MainBenchmark main jlog 105.000 4375.000 9.348 65.787 0.821 0.844 0.034
MainBenchmark main demo 123.000 5525.000 9.511 72.140 0.910 0.934 0.028
MainBenchmark main jdbc 158.000 5696.000 10.836 74.504 0.994 1.016 0.035
MainBenchmark main actr 212.000 5983.000 11.543 76.910 1.098 1.112 0.020
MainBenchmark main actj 252.000 6265.000 12.356 78.859 1.206 1.225 0.023
MainBenchmark main jpae 193.000 6870.000 15.344 97.361 1.335 1.359 0.031
MainBenchmark main conf 282.000 6564.000 13.154 81.778 1.453 1.520 0.074
MainBenchmark main erka 348.000 8203.000 17.541 101.212 2.166 2.233 0.073
MainBenchmark main busr 443.000 7626.000 16.032 91.780 2.127 2.178 0.062
MainBenchmark main zuul 478.000 7845.000 16.462 93.298 2.262 2.345 0.120
MainBenchmark main erkb 515.000 9487.000 21.122 113.213 2.921 2.976 0.052
StripBenchmark strip slim 105.000 5343.000 10.108 71.763 0.847 0.871 0.032
StripBenchmark strip thin 64.000 5120.000 8.825 69.182 0.778 0.806 0.069
StripBenchmark strip lite 30.000 4897.000 7.816 66.857 0.699 0.719 0.027
StripBenchmark strip func 26.000 4831.000 7.738 66.411 0.671 0.684 0.029
Legend:
-
Zuul: same as "busr" sample but with Zuul proxy
-
Jpad: same as "demo" sample but with 1 JPA entity (and 0 repositories)
-
Jpae: same as "demo" sample but with 1 JPA entity (and 1 repository)
-
Jpaf: same as "demo" sample but with 2 JPA entities (and 2 repositories)
-
Jpag: same as "demo" sample but with 3 JPA entities (and 3 repositories)
-
Jpaa: same as "actr" sample but with 3 JPA entities (and 3 repositories)
-
Erkb: same as "busr" sample but with Eureka client
-
Busr: same as "conf" but adds Spring Cloud Bus and Rabbit
-
Erko: same as "actr" sample but with Eureka client (disabled)
-
Conf: same as "actr" sample plus config client
-
Actj: same as "actr" sample plus JDBC
-
Actr: same as "demo" sample plus Actuator
-
Jdbc: same as "demo" sample plus JDBC
-
Demo: vanilla Spring Boot MVC app with one endpoint (no Actuator)
-
Slim: same thing but explicitly
@Imports
all configuration -
Thin: reduce the
@Imports
down to a set of 4 that are needed for the endpoint -
Lite: copy the imports from "thin" and make them into hard-coded, unconditional configuration
-
Func: extract the configuration methods from "lite" and register bits of it using the function bean API
N.B. The "thin" sample has @EnableWebMvc
(implicitly), but "lite"
and "func" pulled the relevant features of that out into a separate
class (so a few beans were dropped).
Older data (Spring Boot 2.0):
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
MainBenchmark zuul avgt 10 4.510 ± 0.095 s/op 516 9656
MainBenchmark erkb avgt 10 3.137 ± 0.049 s/op 494 7654
MainBenchmark busr avgt 10 2.525 ± 0.038 s/op 392 7129
MainBenchmark erko avgt 10 2.237 ± 0.071 s/op 313 6392
MainBenchmark jpaa avgt 10 3.232 ± 0.057 s/op 257 8297
MainBenchmark jpag avgt 10 2.897 ± 0.023 s/op 178 8294
MainBenchmark jpaf avgt 10 2.865 ± 0.043 s/op 177 8291
MainBenchmark jpae avgt 10 2.829 ± 0.038 s/op 176 8284
MainBenchmark jpad avgt 10 2.739 ± 0.073 s/op 175 7946
MainBenchmark conf avgt 10 1.636 ± 0.029 s/op 250 6232
MainBenchmark actj avgt 10 1.540 ± 0.049 s/op 226 6033
MainBenchmark actr avgt 10 1.316 ± 0.060 s/op 186 5666
MainBenchmark jdbc avgt 10 1.237 ± 0.050 s/op 147 5625
MainBenchmark demo avgt 10 1.056 ± 0.040 s/op 111 5266
MainBenchmark slim avgt 10 1.003 ± 0.011 s/op 105 5208
MainBenchmark thin avgt 10 0.855 ± 0.028 s/op 60 4892
MainBenchmark lite avgt 10 0.694 ± 0.015 s/op 30 4580
MainBenchmark func avgt 10 0.652 ± 0.017 s/op 25 4378
The rest of the results in this document are from the same (older) version. They are useful for comparison purposes and to follow the analysis. Spring Boot 2.2 will be faster for all of them.
Only 2 samples didn’t fit the trend for classes vs. startup at first. One starts up slower (Sleuth) and one faster (Erka). The JPA and Zuul samples are outliers for the beans vs. startup correlation, so we include those here again.
Benchmark (sample) Mode Cnt Score Error Units Beans Trend Delta Classes
MainBenchmark slth avgt 10 5.110 ± 0.065 s/op 453 2762 2403 7674
MainBenchmark erka avgt 10 2.183 ± 0.076 s/op 287 2760 -577 7893
MainBenchmark jpad avgt 10 2.739 ± 0.073 s/op 175 1385 1354 7946
MainBenchmark jpae avgt 10 2.829 ± 0.038 s/op 176 1390 1439 8284
MainBenchmark jpaf avgt 10 2.865 ± 0.043 s/op 177 1396 1469 8291
MainBenchmark jpag avgt 10 2.897 ± 0.023 s/op 178 1401 1496 8294
MainBenchmark jpaa avgt 10 3.232 ± 0.057 s/op 257 1814 1418 8297
MainBenchmark zuul avgt 10 4.510 ± 0.095 s/op 516 3080 1430 9596
Legend:
-
Slth: same as "busr" sample but with Sleuth
-
Erka: same as "actr" sample but with Eureka client
The "Trend" number is the best fit prediction of the startup time from the number of classes (or beans for the JPA samples), taken from the non-outlier data. "Delta" is the difference between the actual startup time and the trend value (so it is the extra cost of the features being added).
The "erka" sample started up faster then predicted, but it also has a suspiciously large number of loaded classes (even more classes than with Eureka and Bus). The loaded classes measurements are not stable - you get different answers from run to run - but they don’t usually fluctuate by enough to explain the difference here.
Here’s an explanation for the "slth" result. Spring processes @annotation
matchers in @Pointcuts
extremely inefficiently, so the startup time scales with the number of pointcuts with @annotations
, not so much the number of beans. If the pointcuts are driving it (as suggested by results in these aspectj benchmarks), then the 4 pointcuts with @annotation
matchers would be costing 2403ms or around 600ms each, which is horrendous but consistent with the aspectj benchmarks.
With AspectJ 1.8.13
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
MainBenchmark.main slth ss 10 4.002 ± 0.113 s/op 450 8358
(Makes a huge difference, but still slower than the trend.)
Hibernate fixed startup cost is about 1300ms (the "delta" on "jpad"), which more or less doubles the startup time for a JPA app compared to the vanilla "demo". Spring Data JPA repository creation seems to have a fixed cost of about 90ms, which isn’t nothing but isn’t very large in comparison. Adding repositories and entities might cost something, but it isn’t a lot - the best estimate would be about 30ms per entity from these data (these were very basic, vanilla JpaRepositories
, so maybe it would be more for more complex requirements). The JPA samples (and even Zuul) are a pretty good fit for number of classes loaded versus startup time, so Hibernate isn’t doing a lot of intensive stuff beyond forcing a lot of classes to be loaded.
We can’t easily exclude Jackson from all the sample, but anything that doesn’t use the Actuator can be run with and without to see the difference. Here’s the vanilla "demo" sample
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.snap demo ss 10 1.150 ± 0.076 s/op
and with exclusions.spring-boot-jackson=org.springframework.boot:spring-boot-starter-json
in thin.properties
:
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.snap demo ss 10 1.069 ± 0.036 s/op
So that’s probably worth having.
You need Java 8. You can run the benchmarks in your IDE (they are JUnit 5 tests). Also set -DpublishTo=csv:target/file.csv
(the location of a CSV output file) to get the nicely formatted output data.
There are 4 groups of benchmarks:
-
MainBenchmark
- add features to the "main" demo by manipulating the classpath -
StripBenchmark
- "slim", "thin", "lite", "func" - stripping away from the "main" demo by hardcoding config -
DevtoolsBenchmark
- same asMainBenchmarks
but with Spring Boot Devtools -
CdsBenchmark
- if the JVM you use supports CDS this benchmark will try to enabled it
Also to get decent results from the erk*
samples you need Eureka running locally on port 8761. You can do that with the Spring Boot CLI (for example):
$ spring install org.springframework.cloud:spring-cloud-cli:1.3.4.RELEASE
$ spring cloud eureka
J9 is the IBM JVM, which they open sourced and is now available also as Eclipse J9. The benchmarks are tuned to use different command line optimizations depending on the JVM in use. Here’s a comparison between the regular OpenJDK Hotspot and the OpenJDK Eclipse J9 (still JDK 1.8) build:
Benchmark (sample) Mode Cnt Score Error Units JVM
MainBenchmark.main demo ss 10 1.171 ± 0.044 s/op 8u152-zulu
MainBenchmark.main demo ss 10 1.015 ± 0.116 s/op 8u152-openj9
Eclipse J9 is about 10% faster than HotSpot, probably owing to the ability to cache class data between runs (which is switched on by default in the benchmarks but not in general).
Java 11 is quite a bit slower than Java 8, but you can get back most or all of the difference by switching on Class Data Sharing (CDS). Some results comparing the same samples with and without CDS:
class method sample beans classes heap memory median mean range
MainBenchmark main demo 118.000 6080.000 10.350 86.797 1.332 1.386 0.083
MainBenchmark main jdbc 153.000 6255.000 11.107 88.934 1.398 1.466 0.050
MainBenchmark main actr 207.000 6746.000 12.685 93.063 1.645 1.736 0.113
CdsBenchmark main demo 118.000 6052.000 9.931 52.772 0.863 0.931 0.094
CdsBenchmark main jdbc 153.000 6229.000 10.672 54.221 0.967 1.051 0.088
CdsBenchmark main actr 207.000 6722.000 11.299 57.186 1.157 1.307 0.138
The non-CDS results do not include -noverify
(which would get you
back most or all of the difference). But -noverify
is going to be
deprecated soon, and it’s good to see that you can get close to the
Java 8 results with CDS, even without that (it’s redundant if all the
classes are cached). The improvement with CDS above is not
proportional to the size of the app (classes loaded), which is odd, so
maybe there are more improvements to come if we can understand
that. It’s disappointing though, that the CDS cache doesn’t get
anywhere close to devtools (warm JVM) speeds.
Interestingly the memory usage is much lower with CDS, both heap and non-heap, but mostly "Compressed Class Space" (part of non-heap).
Benchmark (sample) Mode Cnt Score Error Units JVM
MainBenchmark.main demo ss 10 1.171 ± 0.044 s/op 8u152-zulu
MainBenchmark.main demo ss 10 1.015 ± 0.116 s/op 8u152-openj9
MainBenchmark.main demo ss 10 1.253 ± 0.076 s/op OpenJDK10
MainBenchmark.main demo ss 10 1.280 ± 0.066 s/op 9.0.4-zulu
There’s a bean factory post processor in Spring Boot issue 9685 that makes all beans lazy by default. It’s quite interesting to see what happens if we add that to our sample applications:
Benchmark (sample) Mode Cnt Score Error Units Classes
MainBenchmark.main empt ss 10 0.477 ± 0.018 s/op 3233
MainBenchmark.main jlog ss 10 0.811 ± 0.016 s/op 4374
MainBenchmark.main demo ss 10 0.913 ± 0.035 s/op 5463
MainBenchmark.main flux ss 10 0.885 ± 0.030 s/op 5325
MainBenchmark.main actr ss 10 1.241 ± 0.030 s/op 6225
MainBenchmark.main jdbc ss 10 1.001 ± 0.033 s/op 5618
MainBenchmark.main actj ss 10 1.388 ± 0.062 s/op 6432
MainBenchmark.main jpae ss 10 1.994 ± 0.055 s/op 8824
MainBenchmark.main conf ss 10 1.599 ± 0.118 s/op 6711
MainBenchmark.main erka ss 10 1.819 ± 0.045 s/op 6804
MainBenchmark.main busr ss 10 2.431 ± 0.068 s/op 7721
MainBenchmark.main zuul ss 10 3.029 ± 0.086 s/op 8348
MainBenchmark.main erkb ss 10 2.886 ± 0.107 s/op 8083
MainBenchmark.main slth ss 10 3.127 ± 0.041 s/op 8314
c.f. the non-lazy results for the same samples:
Benchmark (sample) Mode Cnt Score Error Units Classes Lazy Premium
MainBenchmark.main empt ss 10 0.578 ± 0.028 s/op 3703 17.47%
MainBenchmark.main jlog ss 10 0.953 ± 0.012 s/op 4647 14.90%
MainBenchmark.main demo ss 10 1.105 ± 0.049 s/op 5808 17.38%
MainBenchmark.main flux ss 10 0.988 ± 0.038 s/op 5726 10.43%
MainBenchmark.main actr ss 10 1.542 ± 0.043 s/op 6692 19.52%
MainBenchmark.main jdbc ss 10 1.281 ± 0.045 s/op 6068 21.86%
MainBenchmark.main actj ss 10 1.819 ± 0.191 s/op 6953 23.69%
MainBenchmark.main jpae ss 10 2.003 ± 0.053 s/op 8824 0.45%
MainBenchmark.main conf ss 10 1.948 ± 0.097 s/op 7216 17.92%
MainBenchmark.main erka ss 10 2.703 ± 0.106 s/op 8909 32.70%
MainBenchmark.main busr ss 10 3.111 ± 0.157 s/op 8282 21.86%
MainBenchmark.main zuul ss 10 3.834 ± 0.086 s/op 9325 21.00%
MainBenchmark.main erkb ss 10 4.026 ± 0.119 s/op 10176 28.32%
MainBenchmark.main slth ss 10 4.066 ± 0.073 s/op 8901 23.09%
and the same thing for the Petclinic:
Benchmark (sample) Mode Cnt Score Error Units Classes Lazy Premium
PetclinicLatestBenchmark.noverify (lazy) avgt 10 3.495 ± 0.059 s/op 9687 25.80%
PetclinicLatestBenchmark.explodedJarMain (lazy) avgt 10 3.023 ± 0.092 s/op 10644 27.40%
PetclinicLatestBenchmark.noverify none avgt 10 4.710 ± 0.053 s/op 11099
PetclinicLatestBenchmark.explodedJarMain none avgt 10 4.164 ± 0.068 s/op 12132
With 2.1.0 snapshots after M4:
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.endp N/A ss 10 1.447 ± 0.038 s/op
SnapBenchmark.snap empt ss 10 0.572 ± 0.012 s/op
SnapBenchmark.snap demo ss 10 1.063 ± 0.023 s/op
SnapBenchmark.snap actr ss 10 1.447 ± 0.019 s/op
SnapBenchmark.snap jdbc ss 10 1.207 ± 0.023 s/op
SnapBenchmark.snap actj ss 10 1.616 ± 0.030 s/op
SnapBenchmark.snap jpae ss 10 2.062 ± 0.093 s/op
SnapBenchmark.snap conf ss 10 1.979 ± 0.183 s/op
MainBenchmark.main demo ss 10 1.147 ± 0.023 s/op
StripBenchmark.strip slim ss 10 1.083 ± 0.031 s/op
StripBenchmark.strip thin ss 10 0.919 ± 0.019 s/op
And before M1:
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.endp N/A ss 10 1.478 ± 0.110 s/op
SnapBenchmark.snap empt ss 10 0.604 ± 0.032 s/op
SnapBenchmark.snap demo ss 10 1.067 ± 0.050 s/op
SnapBenchmark.snap actr ss 10 1.417 ± 0.036 s/op
SnapBenchmark.snap jdbc ss 10 1.245 ± 0.144 s/op
SnapBenchmark.snap actj ss 10 1.606 ± 0.074 s/op
SnapBenchmark.snap jpae ss 10 2.013 ± 0.062 s/op
SnapBenchmark.snap conf ss 10 1.792 ± 0.033 s/op
after M1:
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.endp ss 10 1.425 ± 0.036 s/op
SnapBenchmark.snap empt ss 10 0.570 ± 0.013 s/op
SnapBenchmark.snap demo ss 10 1.039 ± 0.014 s/op
SnapBenchmark.snap actr ss 10 1.417 ± 0.024 s/op
SnapBenchmark.snap jdbc ss 10 1.191 ± 0.028 s/op
SnapBenchmark.snap actj ss 10 1.601 ± 0.052 s/op
SnapBenchmark.snap jpae ss 10 2.017 ± 0.055 s/op
SnapBenchmark.snap conf ss 10 1.874 ± 0.071 s/op
Benchmark (sample) Mode Cnt Score Error Units
AutoBenchmark.auto empt ss 10 0.485 ± 0.024 s/op
AutoBenchmark.auto demo ss 10 0.864 ± 0.033 s/op
AutoBenchmark.auto actr ss 10 1.156 ± 0.060 s/op
AutoBenchmark.auto jdbc ss 10 0.953 ± 0.063 s/op
AutoBenchmark.auto actj ss 10 1.253 ± 0.036 s/op
AutoBenchmark.auto conf ss 10 1.552 ± 0.044 s/op
Spring Boot 2.0.0 snapshots (before RC2):
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
StripBenchmark.strip slim ss 10 1.102 ± 0.041 s/op 107 5754
StripBenchmark.strip thin ss 10 0.941 ± 0.034 s/op 62 5444
StripBenchmark.strip lite ss 10 0.767 ± 0.021 s/op 30 5094
StripBenchmark.strip func ss 10 0.718 ± 0.010 s/op 26 5030
Even in the "lite" and "func" samples, where all the beans are hard coded (no scanning, no autoconfig, no condition evaluation), Boot 2.0 loads way more classes.
(Boot 1.5.4 without -noverify
)
sample | configs | beans | startup(millis) |
---|---|---|---|
slth |
176 |
460 |
5366 |
zuul |
181 |
495 |
4336 |
busr |
151 |
389 |
2758 |
erka |
127 |
310 |
2423 |
conf |
100 |
245 |
1779 |
actr |
72 |
183 |
1430 |
demo |
32 |
108 |
1154 |
slim |
31 |
103 |
1112 |
thin |
14 |
60 |
968 |
lite |
4 |
30 |
813 |
func |
1 |
25 |
742 |
(Boot 1.5.6, 2.0.0.M3 and 2.0.0.BUILD-SNAPSHOT)
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
OldBenchmark.old empt avgt 10 0.738 ± 0.031 s/op 23 3031
OldBenchmark.old demo avgt 10 1.623 ± 0.069 s/op 109 4965
OldBenchmark.old actr avgt 10 2.098 ± 0.093 s/op 187 5384
OldBenchmark.old jdbc avgt 10 1.920 ± 0.083 s/op 140 5280
OldBenchmark.old actj avgt 10 2.417 ± 0.123 s/op 222 5715
OldBenchmark.old jpae avgt 10 2.536 ± 0.124 s/op 165 6841
OldBenchmark.old conf avgt 10 2.639 ± 0.146 s/op 251 5906
OldBenchmark.old erka avgt 10 2.960 ± 0.101 s/op 294 6077
OldBenchmark.old busr avgt 10 3.555 ± 0.125 s/op 370 6443
OldBenchmark.old zuul avgt 10 4.736 ± 0.507 s/op 433 6922
OldBenchmark.old erkb avgt 10 4.519 ± 0.365 s/op 434 6889
OldBenchmark.old slth avgt 10 7.331 ± 0.186 s/op 444 7058
MainBenchmark.main empt avgt 10 0.848 ± 0.059 s/op 22 3271
MainBenchmark.main demo avgt 10 1.773 ± 0.074 s/op 112 5360
MainBenchmark.main actr avgt 10 2.204 ± 0.121 s/op 187 5756
MainBenchmark.main jdbc avgt 10 2.081 ± 0.082 s/op 147 5625
MainBenchmark.main actj avgt 10 2.508 ± 0.091 s/op 226 6033
MainBenchmark.main jpae avgt 10 2.807 ± 0.100 s/op 176 8284
MainBenchmark.main conf avgt 10 2.781 ± 0.159 s/op 350 6232
MainBenchmark.main erka avgt 10 3.311 ± 0.407 s/op 294 6491
MainBenchmark.main busr avgt 10 3.777 ± 0.102 s/op 392 7129
MainBenchmark.main zuul avgt 10 4.758 ± 0.113 s/op 516 9656
MainBenchmark.main erkb avgt 10 4.773 ± 0.105 s/op 494 7654
MainBenchmark.main slth avgt 10 7.926 ± 0.197 s/op 453 7674
StripBenchmark.strip func avgt 10 1.112 ± 0.032 s/op 25 4378
StripBenchmark.strip lite avgt 10 1.205 ± 0.076 s/op 30 4580
StripBenchmark.strip slim avgt 10 1.743 ± 0.099 s/op 105 5208
StripBenchmark.strip thin avgt 10 1.501 ± 0.071 s/op 60 4892
SnapBenchmark.endp N/A avgt 10 2.515 ± 0.509 s/op 199 5838
SnapBenchmark.snap empt avgt 10 0.969 ± 0.123 s/op 22 3269
SnapBenchmark.snap demo avgt 10 1.880 ± 0.205 s/op 112 5356
SnapBenchmark.snap actr avgt 10 2.296 ± 0.101 s/op 198 5833
SnapBenchmark.snap jdbc avgt 10 2.136 ± 0.117 s/op 148 5716