Entity instantiation standard / optimized benchmark #13

mbladel · 2024-10-22T14:44:19Z

Comparison between Hibernate's standard POJO instantiators using reflection (i.e. java.lang.reflect.Constructor#newInstance) vs the bytecode-enhanced optimization which enables direct usage of the class' no-arg constructor. The benchmark tests a very simple "find all" query which instantiates a lot of entities.

Here are the results of a JMH run as an example:

EntityInstantiators.optimized                       thrpt   10    43295.224 ±   354.341   ops/s
EntityInstantiators.optimized:instances             thrpt   10  4329522.444 ± 35434.072   ops/s
EntityInstantiators.optimized:·async                thrpt               NaN                 ---
EntityInstantiators.optimized:·gc.alloc.rate        thrpt   10     1223.515 ±     9.758  MB/sec
EntityInstantiators.optimized:·gc.alloc.rate.norm   thrpt   10    29655.284 ±    19.124    B/op
EntityInstantiators.optimized:·gc.count             thrpt   10      505.000              counts
EntityInstantiators.optimized:·gc.time              thrpt   10      655.000                  ms
EntityInstantiators.standard                        thrpt   10    42191.893 ±   204.744   ops/s
EntityInstantiators.standard:instances              thrpt   10  4219189.309 ± 20474.378   ops/s
EntityInstantiators.standard:·async                 thrpt               NaN                 ---
EntityInstantiators.standard:·gc.alloc.rate         thrpt   10     1190.467 ±     7.342  MB/sec
EntityInstantiators.standard:·gc.alloc.rate.norm    thrpt   10    29611.285 ±    38.247    B/op
EntityInstantiators.standard:·gc.count              thrpt   10      647.000              counts
EntityInstantiators.standard:·gc.time               thrpt   10      702.000                  ms

Showing a ~2.6% increase in ops/s for the optimized case.

We can clearly see the impact of the Constructor.newInstance reflective call when looking at standard case CPU flamegraphs:

While it disappears when using the instantiation optimizer:

franz1981 · 2024-10-22T16:12:11Z

Give me a couple of days @mbladel and will send a PR with the feedbacks!

franz1981 · 2024-10-22T16:16:29Z

Ideally we would like to verify how much better is with a full fat findAll - i.e. something which looks more like a query which uses the istantiator(s) registered.

mbladel · 2024-10-22T16:22:50Z

Right - I thought we only wanted to measure instantiation performance difference so I replicated what Hibernate does internally anyway in the two cases. I'll experiment a bit more and see if I can switch to a query for the benchmark - though in that case I believe the difference in performance will be less noticeable among all the other operations taking place.

mbladel · 2024-10-23T08:17:16Z

@franz1981 I've updated the code to use a query instead and edited the original PR message with the new results and flamegraph. Kindly let me know if there's anything else I should change.

basic/src/test/java/org/hibernate/benchmark/enhancement/EntityInstantiators.java

mbladel · 2024-10-24T16:14:41Z

@franz1981 here are the results of a run on my machine and JDK17 with MONO/QUAD morphism and Standard/Optimized variations:

Benchmark                                      (count)  (instantiation)  (morphism)  (polluteAtWarmup)   Mode  Cnt      Score     Error   Units
EntityInstantiators.query                          100         Standard        MONO              false  thrpt   20  39893.930 ± 426.215   ops/s
EntityInstantiators.query:·async                   100         Standard        MONO              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100         Standard        MONO              false  thrpt   20   1125.068 ±  13.352  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100         Standard        MONO              false  thrpt   20  29627.521 ±  21.604    B/op
EntityInstantiators.query:·gc.count                100         Standard        MONO              false  thrpt   20    152.000            counts
EntityInstantiators.query:·gc.time                 100         Standard        MONO              false  thrpt   20    280.000                ms
EntityInstantiators.query                          100         Standard        QUAD              false  thrpt   20  38011.336 ± 241.473   ops/s
EntityInstantiators.query:·async                   100         Standard        QUAD              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100         Standard        QUAD              false  thrpt   20   1074.268 ±   7.557  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100         Standard        QUAD              false  thrpt   20  29691.272 ±   7.127    B/op
EntityInstantiators.query:·gc.count                100         Standard        QUAD              false  thrpt   20    193.000            counts
EntityInstantiators.query:·gc.time                 100         Standard        QUAD              false  thrpt   20    290.000                ms
EntityInstantiators.query                          100        Optimized        MONO              false  thrpt   20  40357.327 ± 517.328   ops/s
EntityInstantiators.query:·async                   100        Optimized        MONO              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100        Optimized        MONO              false  thrpt   20   1138.476 ±  14.442  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100        Optimized        MONO              false  thrpt   20  29635.270 ±  14.254    B/op
EntityInstantiators.query:·gc.count                100        Optimized        MONO              false  thrpt   20    184.000            counts
EntityInstantiators.query:·gc.time                 100        Optimized        MONO              false  thrpt   20    286.000                ms
EntityInstantiators.query                          100        Optimized        QUAD              false  thrpt   20  38418.609 ± 303.974   ops/s
EntityInstantiators.query:·async                   100        Optimized        QUAD              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100        Optimized        QUAD              false  thrpt   20   1085.104 ±  11.392  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100        Optimized        QUAD              false  thrpt   20  29675.271 ±   7.128    B/op
EntityInstantiators.query:·gc.count                100        Optimized        QUAD              false  thrpt   20    170.000            counts
EntityInstantiators.query:·gc.time                 100        Optimized        QUAD              false  thrpt   20    290.000                ms

The ops/s are almost identical in the QUAD case. We looked at the flamegraphs and the impact of itable stubs for the optimized path is evident in the QUAD case when using optimizers.

The optimized strategy is still better in MONO case, but that's pretty synthetic as real-world applications would often query different entity types.

mbladel · 2024-10-28T10:39:31Z

Further testing with a bytecode enhanced entity (to alleviate the impact of https://hibernate.atlassian.net/browse/HHH-18763) and even with JDK24 showed no significant gain in total ops/s from instantiation optimizers due to the poor performance of the itable stub megamorphic call when using the QUAD case, which should more closely resemble real-world applications.

I'm going to merge this as it's a valuable benchmark in terms of simple query performance, instantiation and handling of new objects in the Persistence Context.

franz1981 · 2024-10-28T14:34:41Z

I'm pasting here some cheap trick to use with care (but I want still you to be aware/play with it); if used right can deliver some very real improvement...

franz1981@58760b8:

in case invokeinterface is so polluted of types which it really popup in profiling data (see itable_stub con creators)
the number of types involved is <= 10 (see https://stackoverflow.com/a/45023935 for why. Check -XX:MinJumpTableSize; if more than 10 better to try nested switches or a bigger switch too and see what happen)
the work performed in the invoked method (i.e. a constructor) is very little

Similar to virtual calls, switches are profiled by the runtime - which means that branches can end up changing based on the observed frequencies - but similarly to "type profile information"s (see https://github.com/openjdk/jdk/blob/120a9357b3cf63427a6c8539128b69b11b9beca3/src/hotspot/share/opto/doCall.cpp#L83-L396 for what happen while the JIT generate a call) once a full compilation happen (i.e. at the latest and more refined level of compilation), their "final form" is shared across different caller(s) - which means that I expect they won't behave that worse from dynamic dispatch ones.

mbladel · 2024-10-28T14:53:20Z

Hey @franz1981, thanks for the tips. This is interesting, though for sure point 2 (number of types <= 10) is definitely very rare in a real-world application scenario. Having a single instantiator class for all entity mappings is something actually feasible in Quarkus, where we collect every mapped type at static-init time. Though seeing the results of reflection performance when it comes to constructors, especially since JDK > 17 with method handles, makes me wonder whether having any instantiation optimization (access optimizers are a very separate topic) is even worth it.

Update standard ORM version to 6.6.1.Final

c090fd4

mbladel mentioned this pull request Oct 22, 2024

Generate Hibernate ORM InstantiationOptimizers to avoid reflection quarkusio/quarkus#43767

Closed

mbladel force-pushed the instantiation branch from 6f3147c to 2f78f7e Compare October 22, 2024 15:08

mbladel force-pushed the instantiation branch from 2f78f7e to b55951e Compare October 23, 2024 08:15

mbladel commented Oct 23, 2024

View reviewed changes

basic/src/test/java/org/hibernate/benchmark/enhancement/EntityInstantiators.java Outdated Show resolved Hide resolved

mbladel force-pushed the instantiation branch from b55951e to 8fcd146 Compare October 24, 2024 16:13

mbladel force-pushed the instantiation branch from 8fcd146 to 4c7744a Compare October 24, 2024 16:20

Entity instantiation standard / optimized benchmark

cf41ead

mbladel force-pushed the instantiation branch from 4c7744a to cf41ead Compare October 28, 2024 10:39

mbladel merged commit f1431eb into hibernate:main Oct 28, 2024

mbladel deleted the instantiation branch October 28, 2024 14:55

mbladel mentioned this pull request Dec 10, 2024

Hibernate ORM extension should save reflection to create/access fields of entities quarkusio/quarkus#43692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity instantiation standard / optimized benchmark #13

Entity instantiation standard / optimized benchmark #13

mbladel commented Oct 22, 2024 •

edited

Loading

franz1981 commented Oct 22, 2024

franz1981 commented Oct 22, 2024

mbladel commented Oct 22, 2024

mbladel commented Oct 23, 2024

mbladel commented Oct 24, 2024 •

edited

Loading

mbladel commented Oct 28, 2024

franz1981 commented Oct 28, 2024 •

edited

Loading

mbladel commented Oct 28, 2024

Entity instantiation standard / optimized benchmark #13

Entity instantiation standard / optimized benchmark #13

Conversation

mbladel commented Oct 22, 2024 • edited Loading

franz1981 commented Oct 22, 2024

franz1981 commented Oct 22, 2024

mbladel commented Oct 22, 2024

mbladel commented Oct 23, 2024

mbladel commented Oct 24, 2024 • edited Loading

mbladel commented Oct 28, 2024

franz1981 commented Oct 28, 2024 • edited Loading

mbladel commented Oct 28, 2024

mbladel commented Oct 22, 2024 •

edited

Loading

mbladel commented Oct 24, 2024 •

edited

Loading

franz1981 commented Oct 28, 2024 •

edited

Loading