-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-10978] Require Ruby 3.1+ for heap profiling #4178
[PROF-10978] Require Ruby 3.1+ for heap profiling #4178
Conversation
On Ruby 3.0, the default for `gc_enabled = true` emits a warning stating it can't be used. This warning is a bit annoying in our tests; let's disable it and only rely on it being enabled on the specs that are actually testing this feature.
It's no longer true that we don't use the new Ruby profiler on Ruby 2.5; what's true is that we enable the "no signals workaround" to avoid the potentially-problematic code path.
**What does this PR do?** This PR raises the minimum Ruby version required for heap profiling from the previous value of >= 2.7 to >= 3.1 due to a new VM bug discovered (see below for details). It's mostly a revert of #3366, where we had first tried to workaround a Ruby 2.7/3.0 bug, but it turns out we missed a spot, and we could trigger VM crashes because of that. **Motivation:** Ruby versions prior to 3.1 had a special optimization called `rb_gc_force_recycle` which would allow objects to directly be garbage collected (e.g. without needing to wait for the GC). It turns out that `rb_gc_force_recycle` did not play well with the changes in Ruby 2.7 to how object ids worked. We uncovered this earlier on during the development of the heap profiler, and put in a workaround for the bug that we thought was enough... Unfortunately, it turns out that the workaround is not enough. The following reproducer, when run on Ruby 2.7 or 3.0 shows how the Ruby VM can segfault inside `id2ref` due to the issue above: ```ruby puts RUBY_DESCRIPTION require "datadog" require "objspace" require "pry" NUM_OBJECTS = 10_000_000 recycled_ids = Array.new(NUM_OBJECTS) { 123 } many_objects = Array.new(NUM_OBJECTS) { Object.new } (0...NUM_OBJECTS).each do |i| recycled_ids[i] = many_objects[i].object_id end puts "Seeded objects!" gets (0...NUM_OBJECTS).each do |i| Datadog::Profiling::StackRecorder::Testing._native_gc_force_recycle(many_objects[i]) many_objects[i] = nil end puts GC.stat puts "Recycled objects!" gets many_objects = nil 10.times { GC.start } Array.new(10_000) { Object.new } 10.times { GC.start } puts GC.stat puts "GC'd objects! (Ruby should have released pages?)" gets recycled_ids.each { |i| begin (nil == ObjectSpace._id2ref(i)) rescue nil end } puts "Done!" ``` Crash details: ``` Program received signal SIGSEGV, Segmentation fault. is_swept_object (ptr=93825033355200, objspace=<optimised out>) at gc.c:3868 3868 return page->flags.before_sweep ? FALSE : TRUE; (gdb) bt #0 is_swept_object (ptr=93825033355200, objspace=<optimised out>) at gc.c:3868 #1 is_garbage_object (objspace=0x55555555d220, objspace=0x55555555d220, ptr=93825033355200) at gc.c:3887 #2 is_live_object (ptr=93825033355200, objspace=0x55555555d220) at gc.c:3909 #3 is_live_object (ptr=93825033355200, objspace=0x55555555d220) at gc.c:3898 #4 id2ref (objid=8264881) at gc.c:3999 #5 os_id2ref (os=<optimised out>, objid=<optimised out>) at gc.c:4019 ``` This crash happens because of two things: 1. Ruby does not clean the object id entry for a recycled object from its internal hash map 2. If the memory page where the object lived is returned back to the OS, trying to `id2ref` on that id will cause Ruby to try to read invalid memory and crash. **Additional Notes:** I've chosen to disable heap profiling on 2.7 and 3.0 because I can't think of a good workaround for the bug above, especially not one that does not increase the overhead of heap profiling. **How to test the change?** This PR updates the test coverage to expect Ruby 3.1+ as the minimum for the feature. You can also quickly validate it doesn't get enabled on the older Rubies using: ``` $ DD_PROFILING_ENABLED=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true bundle exec ddprofrb exec ruby -e "puts RUBY_DESCRIPTION" W, [2024-12-02T10:42:28.771611 #112585] WARN -- datadog: [datadog] Current Ruby version (3.0.5) cannot support heap profiling due to VM bugs/limitations. Please upgrade to Ruby >= 3.1 in order to use this feature. Heap profiling has been disabled. ```
Datadog ReportBranch report: ✅ 0 Failed, 22049 Passed, 1459 Skipped, 5m 41.04s Total Time |
Due to a Ruby VM bug in older Ruby versions, we're going to require Ruby 3.1+ as a minimum version for heap profiling, as per DataDog/dd-trace-rb#4178 . This PR updates the docs to match this raised requirement.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4178 +/- ##
=======================================
Coverage 97.76% 97.76%
=======================================
Files 1357 1357
Lines 81950 81891 -59
Branches 4168 4164 -4
=======================================
- Hits 80117 80060 -57
+ Misses 1833 1831 -2 ☔ View full report in Codecov by Sentry. |
BenchmarksBenchmark execution time: 2024-12-02 14:15:36 Comparing candidate commit 39ffa38 in PR branch Found 0 performance improvements and 2 performance regressions! Performance is the same for 29 metrics, 2 unstable metrics. scenario:line instrumentation - targeted
scenario:method instrumentation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to raise the bar for Ruby version due to costs of patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice spot. Sad about our smart workaround not being enough 🥲
Due to a Ruby VM bug in older Ruby versions, we're going to require Ruby 3.1+ as a minimum version for heap profiling, as per DataDog/dd-trace-rb#4178 . This PR updates the docs to match this raised requirement.
What does this PR do?
This PR raises the minimum Ruby version required for heap profiling from the previous value of >= 2.7 to >= 3.1 due to a new VM bug discovered (see below for details).
It's mostly a revert of #3366, where we had first tried to workaround a Ruby 2.7/3.0 bug, but it turns out we missed a spot, and we could trigger VM crashes because of that.
Motivation:
Ruby versions prior to 3.1 had a special optimization called
rb_gc_force_recycle
which would allow objects to directly be garbage collected (e.g. without needing to wait for the GC).It turns out that
rb_gc_force_recycle
did not play well with the changes in Ruby 2.7 to how object ids worked. We uncovered this earlier on during the development of the heap profiler, and put in a workaround for the bug that we thought was enough...Unfortunately, it turns out that the workaround is not enough. The following reproducer, when run on Ruby 2.7 or 3.0 shows how the Ruby VM can segfault inside
id2ref
due to the issue above:Crash details:
This crash happens because of two things:
id2ref
on that id will cause Ruby to try to read invalid memory and crash.Change log entry:
Require Ruby 3.1+ for heap profiling
Additional Notes:
I've chosen to disable heap profiling on 2.7 and 3.0 because I can't think of a good workaround for the bug above, especially not one that does not increase the overhead of heap profiling.
How to test the change?
This PR updates the test coverage to expect Ruby 3.1+ as the minimum for the feature.
You can also quickly validate it doesn't get enabled on the older Rubies using: