-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double the performance of measuring collections #68
base: master
Are you sure you want to change the base?
Conversation
Improves the performance of measuring collections like ArrayList, HashMap etc.
JMH Benchmarks: Notice how the BenchmarkMeasureCollections times improved dramatically along with even tighter margins of error for these scenarios after the optimizations. Before optimizations
After optimizations:
|
@daniel-rusu Thanks for the patch. I will be curious to see the JMH result when running the benchmark with Java 17 as the numbers in general differ a lot from Java 8. The benchmark is also targetting large collections with 1000 elements. For most applications, I would expect collections to be significantly smaller something between 0 and 100 elements. Using fix collection size in a benchmark also allow the JVM to do some optimisation that it will not be able to do with real collections. Regarding the approach, it is heavily targeted toward collection only measurements. If you measure a deep tree of object, I expect that creating the cache will at some point impact negatively the measurements performance. Specially if you have multiple threads doing similar types of measurements at the same time. |
70% will have between 0 and 100 elements 20% between 100 and 700 elements 10% between 700 and 1000 elements I also added a benchmark for linked lists in order to test deeply nested structures
@blerer thanks for the feedback. I added another commit so that each collection has a random size that follows your suggested distribution:
I also added a linked list benchmark to see how it performs with deeply nested data since measuring that will need to traverse all the links to get to the end of the collection. The updated performance improvements with randomized collection sizes are: INSTRUMENTATION:
INSTRUMENTATION_AND_SPECIFICATION:
UNSAFE:
SPECIFICATION:
Interestingly, looking at the performance progression going from array list to hashmap to linked list, the performance improvement seems to increase as the object graph becomes more deeply nested. I wasn't able to run the JMH benchmarks with Java 17 (or 11) as triggering the benchmark recompiles the project and that failed due to a bunch of compilation errors about unrecognized symbols. Since the caching logic still performs the same actions the first time but with slight changes to store the results in a map, the real overhead is in creating these container objects and inserting them in the map as everything else is the same work as before when encountering a new type (and faster when encountering another instance of the same type). Since creating objects on the JVM is extremely cheap (usually just incrementing a pointer in the free space based on the amount of memory requested), I suspect that the caching overhead is negligible. This reasoning seems to align with the initial benchmarks above for the pre-existing benchmarks where caching isn't beneficial which resulted in unaffected measurements (within the margin of error). So I expect that Java 17 will improve both versions (before & after optimizations) by the same percentage since caching seems to be a negligible part of the runtime. Lastly, the actual results: Before optimizations:
After optimizations
Note that I only re-ran the collection benchmarks as the others are unchanged |
@blerer I'm curious if I can do anything to help get this merged as It's been a couple of months. Since it only compiles on JDK 8, I'm not sure if it's possible to run the JMH benchmarks on JDK 17 but I'm open to suggestions. While JDK 17 will have better absolute performance, I expect these changes to improve the performance on JDK 17 by the same ratio. Do you have any suggestions to improve this PR further? |
Sorry for the delay, @daniel-rusu. I have been busy elsewhere. I will try to get some time to look into the patch in 2 weeks as I need to finish some other stuff first. |
This PR makes measuring collections (deeply) 2.3 times faster on average (across all strategies):
INSTRUMENTATION:
INSTRUMENTATION_AND_SPECIFICATION:
UNSAFE:
SPECIFICATION:
The performance of other workloads such as measuring small objects remains unchanged, within the margin of error, as the setup for the new optimization is negligible compared to the effort of measuring them. I'll add another standalone comment with JMH benchmark results.
Approach:
Create a temporary cache every time the measureDeep function is called to cache class metadata (shallow size and the fields to be measured). When we encounter another object of the same type within the object graph, instead of re-computing the shallow size and re-traversing the inheritance hierarchy of that object to discover the declared fields and then check each field against the field filter, we can skip all that and use the results from the cache.
Since this temporary local cache only exists within the scope of the measureDeep method, there should be no threading concerns or concerns about the cache growing unbounded over multiple invocations of the measureDeep method since the cache is discarded when the method returns.
This resolves #65