[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR reduces the allocation profiling overhead by replacing the Ruby tracepoint API with the lower-level
rb_add_event_hook2
API.The key insight here is that while benchmarking allocation profiling and looking at what the VM was doing, I discovered that tracepoints are just a thin user-friendlier wrapper around the lower-level API.
The lower level API is publicly-available (in "debug.h") but it's listed as "undocumented advanced tracing APIs".
Motivation:
As we're trying to squeeze every bit of performance from the allocation profiling hot-path, it makes sense to make use of the lower-level API.
Additional Notes:
I'm considering experimenting with moving the tracepoint we use for GC profiling to this lower-level API as well, since that's another performance-sensitive code path.
How to test the change?
Functionality-wise, nothing changes, so existing test coverage is enough (and shows this alternative is working correctly).
Here's some benchmarking numbers from
benchmarks/profiler_allocation.rb
:Here,
event_hook
is with the optimization, whereastracepoint
is without it.I am aware these numbers are close to the margin of error. I re-ran my benchmarks a number of times and consistently observed the event_hook version coming out ahead of the tracecpoint version, even if by little.