Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prof-heap] Misc improvements/fixes #3360

Merged
merged 5 commits into from
Jan 8, 2024
Merged

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Jan 3, 2024

What does this PR do?

  • When an object id conflict is not allowed, raise with a more informative error message.
  • Allow configuration of heap size collection (DD_PROFILING_EXPERIMENTAL_HEAP_SIZE_ENABLED) and heap sample rate (DD_PROFILING_EXPERIMENTAL_HEAP_SAMPLE_RATE).

Motivation:

  • Debugging profiler shutdown on Ruby 2.7 workloads due to obj id conflicts.
  • More knobs to enable different benchmarking flavours.

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@AlexJF AlexJF requested review from a team as code owners January 3, 2024 11:42
@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Jan 3, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d809f33) 98.23% compared to head (b1d4a53) 98.24%.
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3360   +/-   ##
=======================================
  Coverage   98.23%   98.24%           
=======================================
  Files        1254     1254           
  Lines       73217    73287   +70     
  Branches     3430     3432    +2     
=======================================
+ Hits        71928    72000   +72     
+ Misses       1289     1287    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch 2 times, most recently from 9806fcd to bd49938 Compare January 3, 2024 17:44
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-misc-fixes branch 3 times, most recently from a272aa5 to 1da1640 Compare January 3, 2024 19:21
benchmarks/profiler_sample_loop_v2.rb Show resolved Hide resolved
# would be invisible to free tracepoints, finalizers and without cleaning
# obj_to_id_tbl mappings. This leads to potential invisible object re-use
# to our heap profiler.
$defs << '-DHAVE_WORKING_RB_GC_FORCE_RECYCLE' if RUBY_VERSION < '3.1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So re: yesterday's discussion of how we may be able to workaround the rb_gc_force_recycle, I think our intuition was correct (and re ruby/ruby#4363 ).

Here's a testcase (thanks ChatGPT for pairing with me on this):

--- a/ext/ddtrace_profiling_native_extension/profiling.c
+++ b/ext/ddtrace_profiling_native_extension/profiling.c
@@ -32,6 +32,7 @@ static void holding_the_gvl_signal_handler(DDTRACE_UNUSED int _signal, DDTRACE_U
 static VALUE _native_trigger_holding_the_gvl_signal_handler_on(DDTRACE_UNUSED VALUE _self, VALUE background_thread);
 static VALUE _native_enforce_success(DDTRACE_UNUSED VALUE _self, VALUE syserr_errno, VALUE with_gvl);
 static void *trigger_enforce_success(void *trigger_args);
+static VALUE _native_force_recycle_test(DDTRACE_UNUSED VALUE _self);
 
 void DDTRACE_EXPORT Init_ddtrace_profiling_native_extension(void) {
   VALUE datadog_module = rb_define_module("Datadog");
@@ -65,6 +66,8 @@ void DDTRACE_EXPORT Init_ddtrace_profiling_native_extension(void) {
   rb_define_singleton_method(testing_module, "_native_install_holding_the_gvl_signal_handler", _native_install_holding_the_gvl_signal_handler, 0);
   rb_define_singleton_method(testing_module, "_native_trigger_holding_the_gvl_signal_handler_on", _native_trigger_holding_the_gvl_signal_handler_on, 1);
   rb_define_singleton_method(testing_module, "_native_enforce_success", _native_enforce_success, 2);
+
+  rb_define_singleton_method(testing_module, "_native_force_recycle_test", _native_force_recycle_test, 0);
 }
static VALUE _native_force_recycle_test(DDTRACE_UNUSED VALUE _self) {
  int number_of_objs = 6000;
  VALUE ruby_array = rb_ary_new_capa(number_of_objs);
  VALUE object_class = rb_const_get(rb_cObject, rb_intern("Object"));
  VALUE object_ids[number_of_objs];
  VALUE object_ids_after_gc[number_of_objs];

  for (int i = 0; i < number_of_objs; i++) {
    VALUE ruby_object = rb_class_new_instance(0, NULL, object_class);
    rb_ary_store(ruby_array, i, ruby_object);
    object_ids[i] = rb_obj_id(ruby_object);
  }

  for (int i = 0; i < RARRAY_LEN(ruby_array); i++) {
    VALUE ruby_object = rb_ary_entry(ruby_array, i);
    rb_ary_store(ruby_array, i, Qnil);
    rb_gc_force_recycle(ruby_object);
  }

  rb_gc_start();

  for (int i = 0; i < RARRAY_LEN(ruby_array); i++) {
    VALUE new_ruby_object = rb_class_new_instance(0, NULL, object_class);

    if (FL_TEST(new_ruby_object, FL_SEEN_OBJ_ID)) {
      printf("New object at index %d has FL_SEEN_OBJ_ID flag set after creation\n", i);
    }

    rb_ary_store(ruby_array, i, new_ruby_object);

    object_ids_after_gc[i] = rb_obj_id(new_ruby_object);

    if (!FL_TEST(new_ruby_object, FL_SEEN_OBJ_ID)) {
      printf("New object at index %d is missing FL_SEEN_OBJ_ID after rb_obj_id call\n", i);
    }
  }

  for (int i = 0; i < number_of_objs; i++) {
    for (int j = 0; j < number_of_objs; j++) {
      if (object_ids[i] == object_ids_after_gc[j]) {
        printf("Object ID %ld is repeated after garbage collection (Indexes: %d, %d)\n", object_ids[i], i, j);
      }
    }
  }

  return Qtrue;
}

and here's the result I get on Ruby 2.7:

New object at index 5352 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5353 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5354 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5355 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5356 is missing FL_SEEN_OBJ_ID after rb_obj_id call
# ...
New object at index 5995 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5996 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5997 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5998 is missing FL_SEEN_OBJ_ID after rb_obj_id call
New object at index 5999 is missing FL_SEEN_OBJ_ID after rb_obj_id call
Object ID 219601 is repeated after garbage collection (Indexes: 5380, 5999)
Object ID 219641 is repeated after garbage collection (Indexes: 5381, 5998)
Object ID 219681 is repeated after garbage collection (Indexes: 5382, 5997)
Object ID 219721 is repeated after garbage collection (Indexes: 5383, 5996)
Object ID 219761 is repeated after garbage collection (Indexes: 5384, 5995)
# ...
Object ID 244201 is repeated after garbage collection (Indexes: 5995, 5356)
Object ID 244241 is repeated after garbage collection (Indexes: 5996, 5355)
Object ID 244281 is repeated after garbage collection (Indexes: 5997, 5354)
Object ID 244321 is repeated after garbage collection (Indexes: 5998, 5353)
Object ID 244361 is repeated after garbage collection (Indexes: 5999, 5352)

...which seems to match with our expectations:

  1. Object ids get repeated for objects that get GC'd with rb_gc_force_recycle
  2. New objects born in their place do not have the FL_SEEN_OBJ_ID set
  3. Buggy objects never get the FL_SEEN_OBJ_ID set after a call to rb_obj_id

Base automatically changed from alexjf/prof-8667-heap-profiling-part5-size to master January 5, 2024 13:06
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-misc-fixes branch from 1da1640 to d62e88b Compare January 5, 2024 13:12
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-misc-fixes branch from d62e88b to a4c19cf Compare January 5, 2024 13:21
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I think we're still missing a few details stemming from the discussion around rb_gc_force_recycle:

  • Detecting when a slot got reused by recycle via st_object_record_update and stop tracking it
  • Maybe raising when we see a duplicate object id that doesn't seem related to the rb_gc_force_recycle bug
  • Maybe documenting the potential FL_SEEN_OBJ_ID workaround of setting it back when we find it missing?

...but my suggestion would be to close up this PR and do those changes in a follow-up PR.

Comment on lines 11 to 20
static ID _inspect_id = Qnil;
static ID _to_s_id = Qnil;

void ruby_helpers_init(void) {
rb_global_variable(&module_object_space);

module_object_space = rb_const_get(rb_cObject, rb_intern("ObjectSpace"));
_id2ref_id = rb_intern("_id2ref");
_inspect_id = rb_intern("inspect");
_to_s_id = rb_intern("to_s");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Wait, was the _ above a convention? I thought it was because the symbol actually started with _ 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Show resolved Hide resolved
lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
@AlexJF AlexJF force-pushed the alexjf/heap-profiler-misc-fixes branch from 1a4ca3b to d6184e2 Compare January 5, 2024 19:16
@AlexJF
Copy link
Contributor Author

AlexJF commented Jan 5, 2024

Things related to force_recycle moved to #3366

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
AlexJF and others added 2 commits January 8, 2024 11:18
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
@AlexJF AlexJF merged commit c91bc1f into master Jan 8, 2024
218 checks passed
@AlexJF AlexJF deleted the alexjf/heap-profiler-misc-fixes branch January 8, 2024 11:41
@github-actions github-actions bot added this to the 1.19.0 milestone Jan 8, 2024
TonyCTHsu pushed a commit that referenced this pull request Jan 9, 2024
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
@ivoanjo ivoanjo restored the alexjf/heap-profiler-misc-fixes branch January 24, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants