Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity instantiation standard / optimized benchmark #13

Merged
merged 2 commits into from
Oct 28, 2024

Conversation

mbladel
Copy link
Contributor

@mbladel mbladel commented Oct 22, 2024

Comparison between Hibernate's standard POJO instantiators using reflection (i.e. java.lang.reflect.Constructor#newInstance) vs the bytecode-enhanced optimization which enables direct usage of the class' no-arg constructor. The benchmark tests a very simple "find all" query which instantiates a lot of entities.

Here are the results of a JMH run as an example:

EntityInstantiators.optimized                       thrpt   10    43295.224 ±   354.341   ops/s
EntityInstantiators.optimized:instances             thrpt   10  4329522.444 ± 35434.072   ops/s
EntityInstantiators.optimized:·async                thrpt               NaN                 ---
EntityInstantiators.optimized:·gc.alloc.rate        thrpt   10     1223.515 ±     9.758  MB/sec
EntityInstantiators.optimized:·gc.alloc.rate.norm   thrpt   10    29655.284 ±    19.124    B/op
EntityInstantiators.optimized:·gc.count             thrpt   10      505.000              counts
EntityInstantiators.optimized:·gc.time              thrpt   10      655.000                  ms
EntityInstantiators.standard                        thrpt   10    42191.893 ±   204.744   ops/s
EntityInstantiators.standard:instances              thrpt   10  4219189.309 ± 20474.378   ops/s
EntityInstantiators.standard:·async                 thrpt               NaN                 ---
EntityInstantiators.standard:·gc.alloc.rate         thrpt   10     1190.467 ±     7.342  MB/sec
EntityInstantiators.standard:·gc.alloc.rate.norm    thrpt   10    29611.285 ±    38.247    B/op
EntityInstantiators.standard:·gc.count              thrpt   10      647.000              counts
EntityInstantiators.standard:·gc.time               thrpt   10      702.000                  ms

Showing a ~2.6% increase in ops/s for the optimized case.

We can clearly see the impact of the Constructor.newInstance reflective call when looking at standard case CPU flamegraphs:
image

While it disappears when using the instantiation optimizer:
image

@franz1981
Copy link
Contributor

Give me a couple of days @mbladel and will send a PR with the feedbacks!

@franz1981
Copy link
Contributor

Ideally we would like to verify how much better is with a full fat findAll - i.e. something which looks more like a query which uses the istantiator(s) registered.

@mbladel
Copy link
Contributor Author

mbladel commented Oct 22, 2024

Right - I thought we only wanted to measure instantiation performance difference so I replicated what Hibernate does internally anyway in the two cases. I'll experiment a bit more and see if I can switch to a query for the benchmark - though in that case I believe the difference in performance will be less noticeable among all the other operations taking place.

@mbladel
Copy link
Contributor Author

mbladel commented Oct 23, 2024

@franz1981 I've updated the code to use a query instead and edited the original PR message with the new results and flamegraph. Kindly let me know if there's anything else I should change.

@mbladel
Copy link
Contributor Author

mbladel commented Oct 24, 2024

@franz1981 here are the results of a run on my machine and JDK17 with MONO/QUAD morphism and Standard/Optimized variations:

Benchmark                                      (count)  (instantiation)  (morphism)  (polluteAtWarmup)   Mode  Cnt      Score     Error   Units
EntityInstantiators.query                          100         Standard        MONO              false  thrpt   20  39893.930 ± 426.215   ops/s
EntityInstantiators.query:·async                   100         Standard        MONO              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100         Standard        MONO              false  thrpt   20   1125.068 ±  13.352  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100         Standard        MONO              false  thrpt   20  29627.521 ±  21.604    B/op
EntityInstantiators.query:·gc.count                100         Standard        MONO              false  thrpt   20    152.000            counts
EntityInstantiators.query:·gc.time                 100         Standard        MONO              false  thrpt   20    280.000                ms
EntityInstantiators.query                          100         Standard        QUAD              false  thrpt   20  38011.336 ± 241.473   ops/s
EntityInstantiators.query:·async                   100         Standard        QUAD              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100         Standard        QUAD              false  thrpt   20   1074.268 ±   7.557  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100         Standard        QUAD              false  thrpt   20  29691.272 ±   7.127    B/op
EntityInstantiators.query:·gc.count                100         Standard        QUAD              false  thrpt   20    193.000            counts
EntityInstantiators.query:·gc.time                 100         Standard        QUAD              false  thrpt   20    290.000                ms
EntityInstantiators.query                          100        Optimized        MONO              false  thrpt   20  40357.327 ± 517.328   ops/s
EntityInstantiators.query:·async                   100        Optimized        MONO              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100        Optimized        MONO              false  thrpt   20   1138.476 ±  14.442  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100        Optimized        MONO              false  thrpt   20  29635.270 ±  14.254    B/op
EntityInstantiators.query:·gc.count                100        Optimized        MONO              false  thrpt   20    184.000            counts
EntityInstantiators.query:·gc.time                 100        Optimized        MONO              false  thrpt   20    286.000                ms
EntityInstantiators.query                          100        Optimized        QUAD              false  thrpt   20  38418.609 ± 303.974   ops/s
EntityInstantiators.query:·async                   100        Optimized        QUAD              false  thrpt             NaN               ---
EntityInstantiators.query:·gc.alloc.rate           100        Optimized        QUAD              false  thrpt   20   1085.104 ±  11.392  MB/sec
EntityInstantiators.query:·gc.alloc.rate.norm      100        Optimized        QUAD              false  thrpt   20  29675.271 ±   7.128    B/op
EntityInstantiators.query:·gc.count                100        Optimized        QUAD              false  thrpt   20    170.000            counts
EntityInstantiators.query:·gc.time                 100        Optimized        QUAD              false  thrpt   20    290.000                ms

The ops/s are almost identical in the QUAD case. We looked at the flamegraphs and the impact of itable stubs for the optimized path is evident in the QUAD case when using optimizers.

The optimized strategy is still better in MONO case, but that's pretty synthetic as real-world applications would often query different entity types.

@mbladel
Copy link
Contributor Author

mbladel commented Oct 28, 2024

Further testing with a bytecode enhanced entity (to alleviate the impact of https://hibernate.atlassian.net/browse/HHH-18763) and even with JDK24 showed no significant gain in total ops/s from instantiation optimizers due to the poor performance of the itable stub megamorphic call when using the QUAD case, which should more closely resemble real-world applications.

I'm going to merge this as it's a valuable benchmark in terms of simple query performance, instantiation and handling of new objects in the Persistence Context.

@mbladel mbladel merged commit f1431eb into hibernate:main Oct 28, 2024
@franz1981
Copy link
Contributor

franz1981 commented Oct 28, 2024

I'm pasting here some cheap trick to use with care (but I want still you to be aware/play with it); if used right can deliver some very real improvement...

franz1981@58760b8:

  1. in case invokeinterface is so polluted of types which it really popup in profiling data (see itable_stub con creators)
  2. the number of types involved is <= 10 (see https://stackoverflow.com/a/45023935 for why. Check -XX:MinJumpTableSize; if more than 10 better to try nested switches or a bigger switch too and see what happen)
  3. the work performed in the invoked method (i.e. a constructor) is very little

Similar to virtual calls, switches are profiled by the runtime - which means that branches can end up changing based on the observed frequencies - but similarly to "type profile information"s (see https://github.com/openjdk/jdk/blob/120a9357b3cf63427a6c8539128b69b11b9beca3/src/hotspot/share/opto/doCall.cpp#L83-L396 for what happen while the JIT generate a call) once a full compilation happen (i.e. at the latest and more refined level of compilation), their "final form" is shared across different caller(s) - which means that I expect they won't behave that worse from dynamic dispatch ones.

@mbladel
Copy link
Contributor Author

mbladel commented Oct 28, 2024

Hey @franz1981, thanks for the tips. This is interesting, though for sure point 2 (number of types <= 10) is definitely very rare in a real-world application scenario. Having a single instantiator class for all entity mappings is something actually feasible in Quarkus, where we collect every mapped type at static-init time. Though seeing the results of reflection performance when it comes to constructors, especially since JDK > 17 with method handles, makes me wonder whether having any instantiation optimization (access optimizers are a very separate topic) is even worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants