Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Fix intermittent crashes with GC.compact #171

Closed
wants to merge 2 commits into from

Conversation

stanhu
Copy link

@stanhu stanhu commented Jan 20, 2021

This fixes the known issue with compaction as described in
https://bugs.ruby-lang.org/issues/15626#Known-Issue.

Since Hamlit holds points to references to VALUE pointers, these
VALUE pointers could move due to compaction. To avoid this, we
pin these objects with rb_gc_mark.

I was able to reproduce this issue intermittently via:

# Load a large gem
require 'google/apis'
require 'hamlit'

100.times do
  4.times { GC.start }
  GC.compact

  ::Hamlit::AttributeBuilder.build_class(true, "tab-pane js-toggle-container px-0 pb-0".freeze, 'active')
end

This fixes the known issue with compaction as described in
https://bugs.ruby-lang.org/issues/15626#Known-Issue.

Since Hamlit holds points to references to VALUE pointers, these
VALUE pointers could move due to compaction. To avoid this, we
pin these objects with `rb_gc_mark`.
@stanhu stanhu changed the title Fix intermittent crashes with GC.compact Draft: Fix intermittent crashes with GC.compact Jan 20, 2021
@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

Hmm, this crashes Ruby 3.0 indeed.

@mttkay
Copy link

mttkay commented Jan 20, 2021

Hmm, this crashes Ruby 3.0 indeed.

I wonder if this is related to Ractors: https://github.com/ruby/ruby/blob/ruby_3_0/gc.c#L10166-L10171

This is crashing in libpthread

@mttkay
Copy link

mttkay commented Jan 20, 2021

I wonder as well: I believe marking is done recursively, starting with root objects. These objects here are modules. Perhaps we shouldn't explicitly mark the submodules. Could we try to only mark mHamlit and see if that helps?

@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

Could we try to only mark mHamlit and see if that helps?

Good idea. Let me revise.

I think this still crashes in Ruby 3.0, though.

@ko1
Copy link

ko1 commented Jan 20, 2021

made objects by rb_define_module_under() are pinned. Don't worry.

`rb_define_module_under()` are already pinned.
@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

I'm still seeing this failure in my reproduction script:

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0020 e:000019 CFUNC  :build_class
c:0004 p:0045 s:0013 e:000012 BLOCK  test3.rb:9 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :times
c:0002 p:0017 s:0006 e:000005 EVAL   test3.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:001ae0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
test3.rb:5:in `<main>'
test3.rb:5:in `times'
test3.rb:9:in `block in <main>'
test3.rb:9:in `build_class'

-- Machine register context ------------------------------------------------
 rax: 0x0000000000000000 rbx: 0x00007f8442863d50 rcx: 0x000000010b012ff0
 rdx: 0x0000000000000000 rdi: 0x00007f84620b1fb0 rsi: 0x00000000000126db
 rbp: 0x00007ffee4f5a680 rsp: 0x00007ffee4f5a630  r8: 0x0000000000000000
  r9: 0x00007f8442863bd0 r10: 0x00007f8461cfea90 r11: 0x000000010af1f180
 r12: 0x0000000000000000 r13: 0x00000000000126db r14: 0x00007f84620b1fb0
 r15: 0x00007f84620b1fb0 rip: 0x000000010af217e7 rfl: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_vm_bugreport+0x6cf) [0x10af52b8f]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_bug_for_fatal_signal+0x1d6) [0x10ad6b956]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(sigsegv+0x5b) [0x10aeaa72b]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff6cd2b5fd]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_const_search_from+0x47) [0x10af217e7]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_const_get_0+0x4a) [0x10af1f07a]
/Users/stanhu/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/hamlit-2.14.1/lib/hamlit/hamlit.bundle(hamlit_build_class+0x315) [0x10c6ded85]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_call_cfunc_with_frame+0x14f) [0x10af48ebf]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_sendish+0x52c) [0x10af40d9c]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_exec_core+0x3de0) [0x10af26d10]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_vm_exec+0xb39) [0x10af3b889]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(invoke_block_from_c_bh+0x73c) [0x10af4cdbc]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_yield_1+0x77) [0x10af33577]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(int_dotimes+0xc8) [0x10ae08098]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_call_cfunc_with_frame+0x14f) [0x10af48ebf]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_sendish+0x52c) [0x10af40d9c]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(vm_exec_core+0x3d7e) [0x10af26cae]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_vm_exec+0xb39) [0x10af3b889]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(rb_ec_exec_node+0x127) [0x10ad76bb7]
/Users/stanhu/.rbenv/versions/3.0.0/lib/libruby.3.0.dylib(ruby_run_node+0x57) [0x10ad76a37]
/Users/stanhu/.rbenv/versions/3.0.0/bin/ruby(main+0x5d) [0x10aca7f0d]

-- Other runtime information -----------------------------------------------

@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

Hmm, are modules defined via rb_define_module pinned as well? I also see a call to rb_gc_register_mark_object as well. Is there some other issue here?

@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

Running the script a few times on Linux yields this:

-- Machine register context ------------------------------------------------
 RIP: 0x00007f04320b7244 RBP: 0x00007ffdc5986f50 RSP: 0x00007ffdc5986ef8
 RAX: 0x0000000000000000 RBX: 0x0000556a57080fe0 RCX: 0x00007f04320d8660
 RDX: 0x0000000000000000 RDI: 0x0000556a57080fe0 RSI: 0x00000000000117cb
  R8: 0x0000556a56bc75d0  R9: 0x0000556a56bc7570 R10: 0x0000000000000025
 R11: 0x00007f04320b8ee0 R12: 0x00000000000117cb R13: 0x0000556a57080fe0
 R14: 0x0000000000000000 R15: 0x00007ffdc5986f18 EFL: 0x0000000000010246

-- C level backtrace information -------------------------------------------
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_vm_bugreport+0x79f) [0x7f04320e38bf] vm_dump.c:755
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_bug_for_fatal_signal+0xdd) [0x7f0431f03f2d] error.c:660
/opt/gitlab/embedded/lib/libruby.so.2.7(sigsegv+0x52) [0x7f04320477d2] signal.c:946
/lib/x86_64-linux-gnu/libc.so.6(0x7f0431a87fd0) [0x7f0431a87fd0]
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_const_lookup+0x4) [0x7f04320b7244] variable.c:3354
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_const_get+0x35) [0x7f04320b8f15] variable.c:2339
/opt/gitlab/embedded/lib/ruby/gems/2.7.0/gems/hamlit-2.11.0/lib/hamlit/hamlit.so(str_space+0x16) [0x7f042c426e39] hamlit.c:16
/opt/gitlab/embedded/lib/ruby/gems/2.7.0/gems/hamlit-2.11.0/lib/hamlit/hamlit.so(hamlit_build_class) hamlit.c:158
/opt/gitlab/embedded/lib/ruby/gems/2.7.0/gems/hamlit-2.11.0/lib/hamlit/hamlit.so(rb_hamlit_build_class+0x5b) [0x7f042c4272ab] hamlit.c:478
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_call_cfunc+0x11f) [0x7f04320c02ff] vm_insnhelper.c:2514
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_call_method+0xc5) [0x7f04320dc565] vm_insnhelper.c:3053
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_exec_core+0x154) [0x7f04320cb7d4] vm_insnhelper.c:4023
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_vm_exec+0x98) [0x7f04320d1c68] vm.c:1920
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_yield_1+0x19a) [0x7f04320d4a6a] vm.c:1116
/opt/gitlab/embedded/lib/libruby.so.2.7(int_dotimes+0x48) [0x7f0431fa6b98] numeric.c:5201
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_call_cfunc+0x11f) [0x7f04320c02ff] vm_insnhelper.c:2514
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_call_method+0xc5) [0x7f04320dc565] vm_insnhelper.c:3053
/opt/gitlab/embedded/lib/libruby.so.2.7(vm_exec_core+0x209) [0x7f04320cb889] vm_insnhelper.c:4023
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_vm_exec+0x7c7) [0x7f04320d2397] vm.c:1929
/opt/gitlab/embedded/lib/libruby.so.2.7(rb_ec_exec_node+0xa0) [0x7f0431f0aff0] eval.c:278

@stanhu
Copy link
Author

stanhu commented Jan 20, 2021

Is the actual issue here the rb_str_new_cstr calls are not pinned?

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

I successfully reproduced the issue with your script, but after I rebuilt the gem, it never happened again... It'd be really helpful if we can somehow increase the possibility of reproduction, because I cannot verify any fix on my end anymore.

But this output #171 (comment) is really helpful. It shows inlined frames and line numbers of Hamlit unlike #171 (comment). It's unfortunate that this is not Ruby 3.0's but Ruby 2.7's though.

According to the backtrace, the problem seems to happen in rb_const_get(mAttributeBuilder, id_space) inside str_space(). Assuming rb_intern()'s result doesn't move (does it?), reference to mAttributeBuilder is the most suspicious, even while ko1 said objects made by rb_define_module_under are pinned. It might be specific to Ruby 3.0 though (I'm not sure).

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

This is very hard to fix on my end because your script doesn't reproduce the issue on my environment (Linux x86_64) anymore. I've reproduced only once so far.

ko1 advised me to check if mAttributeBuilder is moved or not to confirm my suspicion. Could you check if this code shows 1 before it SEGVs on your environment?

diff --git a/ext/hamlit/hamlit.c b/ext/hamlit/hamlit.c
index 11efb33..d883678 100644
--- a/ext/hamlit/hamlit.c
+++ b/ext/hamlit/hamlit.c
@@ -9,12 +9,12 @@ static ID id_flatten, id_keys, id_parse, id_prepend, id_tr, id_uniq_bang;
 static ID id_aria, id_data, id_equal, id_hyphen, id_space, id_underscore;
 static ID id_boolean_attributes, id_xhtml;

-static VALUE str_aria()        { return rb_const_get(mAttributeBuilder, id_aria); }
-static VALUE str_data()        { return rb_const_get(mAttributeBuilder, id_data); }
-static VALUE str_equal()       { return rb_const_get(mAttributeBuilder, id_equal); }
-static VALUE str_hyphen()      { return rb_const_get(mAttributeBuilder, id_hyphen); }
-static VALUE str_space()       { return rb_const_get(mAttributeBuilder, id_space); }
-static VALUE str_underscore()  { return rb_const_get(mAttributeBuilder, id_underscore); }
+static VALUE str_aria()        { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_aria); }
+static VALUE str_data()        { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_data); }
+static VALUE str_equal()       { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_equal); }
+static VALUE str_hyphen()      { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_hyphen); }
+static VALUE str_space()       { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_space); }
+static VALUE str_underscore()  { fprintf(stderr, "%d", RB_TYPE_P(mAttributeBuilder, T_MOVED)); return rb_const_get(mAttributeBuilder, id_underscore); }

 static void
 delete_falsey_values(VALUE values)

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

It returns 0:

$ ruby fail.rb
0fail.rb:9: [BUG] Segmentation fault at 0x0000000000000018
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0020 e:000019 CFUNC  :build_class
c:0004 p:0045 s:0013 e:000012 BLOCK  fail.rb:9 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :times
c:0002 p:0017 s:0006 e:000005 EVAL   fail.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:001f00 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
fail.rb:5:in `<main>'
fail.rb:5:in `times'
fail.rb:9:in `block in <main>'
fail.rb:9:in `build_class'

-- Machine register context ------------------------------------------------
 RIP: 0x00007f88a9640088 RBP: 0x000000000001237b RSP: 0x00007fff22729ca0
 RAX: 0x0000000000000000 RBX: 0x000055a6bdb47f08 RCX: 0x0000000000000000
 RDX: 0x00007f88a93bd8b0 RDI: 0x000055a6bdb47f08 RSI: 0x000000000001237b
  R8: 0x0000000000000001  R9: 0x0000000000000000 R10: 0x0000000000000026
 R11: 0x00007f88a9641de0 R12: 0x000055a6bdb47f08 R13: 0x00007fff22729cc0
 R14: 0x0000000000000000 R15: 0x000055a6bdad6aa0 EFL: 0x0000000000010246

-- C level backtrace information -------------------------------------------
/home/stanhu/.rbenv/versions/2.7.2/lib/libruby.so.2.7(rb_vm_bugreport+0x7ce) [0x7f88a966d28e] vm_dump.c:755
/home/stanhu/.rbenv/versions/2.7.2/lib/libruby.so.2.7(rb_bug_for_fatal_signal+0xe7) [0x7f88a948c6e7] error.c:660
/home/stanhu/.rbenv/versions/2.7.2/lib/libruby.so.2.7(sigsegv+0x4b) [0x7f88a95d06eb] signal.c:946
/lib/x86_64-linux-gnu/libc.so.6(0x7f88a900efd0) [0x7f88a900efd0]
/home/stanhu/.rbenv/versions/2.7.2/lib/libruby.so.2.7(rb_const_lookup+0x18) [0x7f88a9640088] variable.c:3354
/home/stanhu/.rbenv/versions/2.7.2/lib/libruby.so.2.7(rb_const_get+0x41) [0x7f88a9641e21] variable.c:2339
/home/stanhu/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/hamlit-2.14.1/lib/hamlit/hamlit.so(str_space+0x52) [0x7f889f19c6b6] hamlit.c:16
/home/stanhu/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/hamlit-2.14.1/lib/hamlit/hamlit.so(hamlit_build_class) hamlit.c:157
/home/stanhu/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/hamlit-2.14.1/lib/hamlit/hamlit.so(rb_hamlit_build_class+0x6e) [0x7f889f19cb7e] hamlit.c:4
77

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

I also checked mObjectRef, and it returns 0 as well.

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

It seems that I can easily reproduce this with Ruby 2.7.2. I'll take a look at the details.

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

I've failed to find anything moved. By the way, is it possible to reproduce this issue without require 'google/apis'? I thought potentially the class reference tree could be broken due to google-api-client and any rb_const_get could be failing due to the gem whether it's Hamlit or not.

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

It's still not clear why it's happening, but I found one workaround. I'm implementing it.

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

Thanks! I'll close this pull request then.

I was able to reproduce this without the require 'google/apis'.

@stanhu stanhu closed this Jan 21, 2021
k0kubun added a commit that referenced this pull request Jan 21, 2021
to workaround SEGV after GC.compact.

Fix #171
k0kubun added a commit that referenced this pull request Jan 21, 2021
to workaround SEGV after GC.compact.

Fix #171
k0kubun added a commit that referenced this pull request Jan 21, 2021
to workaround SEGV after GC.compact.

Fix #171
@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

Could you check if #172 works? I'll release this tomorrow if it works out.

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

Let me give it a shot.

FYI, I haven't yet been able to reproduce the failure with this patch. Note I first tried marking the result of the rb_obj_freeze(str), but that didn't seem to work.

diff --git a/ext/hamlit/hamlit.c b/ext/hamlit/hamlit.c
index 11efb33..4ef812c 100644
--- a/ext/hamlit/hamlit.c
+++ b/ext/hamlit/hamlit.c
@@ -510,6 +510,16 @@ rb_hamlit_build(int argc, VALUE *argv, RB_UNUSED_VAR(VALUE self))
   return hamlit_build(argv[0], argv[1], argv[2], argv[3], array);
 }

+void
+set_attribute_builder_const(ID id, const char *name)
+{
+  VALUE str = rb_str_new_cstr(name);
+  VALUE obj = rb_obj_freeze(str);
+
+  rb_gc_register_mark_object(str);
+  rb_const_set(mAttributeBuilder, id, obj);
+}
+
 void
 Init_hamlit(void)
 {
@@ -544,11 +554,11 @@ Init_hamlit(void)
   id_boolean_attributes = rb_intern("BOOLEAN_ATTRIBUTES");
   id_xhtml = rb_intern("xhtml");

-  rb_const_set(mAttributeBuilder, id_aria,       rb_obj_freeze(rb_str_new_cstr("aria")));
-  rb_const_set(mAttributeBuilder, id_data,       rb_obj_freeze(rb_str_new_cstr("data")));
-  rb_const_set(mAttributeBuilder, id_equal,      rb_obj_freeze(rb_str_new_cstr("=")));
-  rb_const_set(mAttributeBuilder, id_hyphen,     rb_obj_freeze(rb_str_new_cstr("-")));
-  rb_const_set(mAttributeBuilder, id_space,      rb_obj_freeze(rb_str_new_cstr(" ")));
-  rb_const_set(mAttributeBuilder, id_underscore, rb_obj_freeze(rb_str_new_cstr("_")));
+  set_attribute_builder_const(id_aria, "aria");
+  set_attribute_builder_const(id_data, "data");
+  set_attribute_builder_const(id_equal, "=");
+  set_attribute_builder_const(id_hyphen, "-");
+  set_attribute_builder_const(id_space, " ");
+  set_attribute_builder_const(id_underscore, "_");
 }
 #endif

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

Eh, never mind, it just failed. Let me try your patch.

@stanhu
Copy link
Author

stanhu commented Jan 21, 2021

#172 appears to work well for me. Thank you!

@k0kubun
Copy link
Owner

k0kubun commented Jan 21, 2021

Thanks for checking it quickly. I released it as v2.14.2.

stanhu added a commit to stanhu/hamlit that referenced this pull request Mar 18, 2021
This would have detected k0kubun#177 and k0kubun#171 in Ruby 3.0.
@stanhu stanhu mentioned this pull request Mar 18, 2021
stanhu added a commit to stanhu/hamlit that referenced this pull request Mar 18, 2021
This would have detected k0kubun#177 and k0kubun#171 in Ruby 3.0.
stanhu added a commit to stanhu/hamlit that referenced this pull request Mar 18, 2021
This would have detected k0kubun#177 and k0kubun#171 in Ruby 3.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants