Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_set.rb:24: [BUG] Segmentation fault #881

Closed
fenelon opened this issue Apr 5, 2013 · 32 comments
Closed

node_set.rb:24: [BUG] Segmentation fault #881

fenelon opened this issue Apr 5, 2013 · 32 comments

Comments

@fenelon
Copy link

fenelon commented Apr 5, 2013

This mostly happens in concurrent environments. I'm using nokogiri with sidekiq, and roadie (which also uses nokogiri).

nokogiri -v

# Nokogiri (1.5.6)
    ---
    warnings: []
    nokogiri: 1.5.6
    ruby:
      version: 1.9.3
      platform: x86_64-linux
      description: ruby 1.9.3p385 (2013-02-06) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.9.0
      loaded: 2.9.0

Backtrace:

/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24: [BUG] Segmentation fault
ruby 1.9.3p385 (2013-02-06) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0055 p:---- s:0236 b:0236 l:000235 d:000235 CFUNC  :[]
c:0054 p:0022 s:0232 b:0232 l:000231 d:000231 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24
c:0053 p:0019 s:0227 b:0227 l:000226 d:000226 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node.rb:248
c:0052 p:0160 s:0223 b:0223 l:000222 d:000222 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:193
c:0051 p:0654 s:0217 b:0217 l:000216 d:000216 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:161
c:0050 p:0012 s:0203 b:0203 l:002208 d:000202 BLOCK  /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:93
c:0049 p:0015 s:0199 b:0199 l:000189 d:000198 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:239
c:0048 p:---- s:0196 b:0196 l:000195 d:000195 FINISH
c:0047 p:---- s:0194 b:0194 l:000193 d:000193 CFUNC  :upto
c:0046 p:0023 s:0190 b:0190 l:000189 d:000189 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238
c:0045 p:0133 s:0186 b:0186 l:002208 d:001360 BLOCK  /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:92
c:0044 p:---- s:0181 b:0181 l:000180 d:000180 FINISH
c:0043 p:---- s:0179 b:0179 l:000178 d:000178 CFUNC  :each
c:0042 p:0018 s:0176 b:0176 l:002208 d:002208 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85
c:0041 p:0013 s:0171 b:0171 l:000170 d:000170 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/pegast.rb:39
c:0040 p:0049 s:0167 b:0167 l:000166 d:000166 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:43
c:0039 p:0252 s:0161 b:0161 l:000160 d:000160 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:68
c:0038 p:0187 s:0155 b:0155 l:000154 d:000154 METHOD /home/deployer/apps/leveltravel/current/app/actors/operator_search_request_base.rb:25
c:0037 p:0035 s:0147 b:0147 l:000650 d:000146 BLOCK  /home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:50
c:0036 p:---- s:0143 b:0143 l:000142 d:000142 FINISH
c:0035 p:---- s:0141 b:0141 l:000140 d:000140 CFUNC  :each
c:0034 p:0023 s:0138 b:0138 l:000137 d:000137 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6
c:0033 p:0164 s:0133 b:0133 l:000650 d:000650 METHOD /home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:47
c:0032 p:0025 s:0121 b:0121 l:001480 d:000120 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:49
c:0031 p:---- s:0119 b:0119 l:000118 d:000118 FINISH
c:0030 p:---- s:0117 b:0117 l:000116 d:000116 CFUNC  :call
c:0029 p:0024 s:0114 b:0114 l:001ba8 d:000113 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109
c:0028 p:0005 s:0112 b:0112 l:001e30 d:000111 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:11
c:0027 p:0111 s:0110 b:0110 l:001b80 d:001b80 METHOD /usr/local/lib/ruby/1.9.1/timeout.rb:68
c:0026 p:0061 s:0098 b:0098 l:001e30 d:001e30 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:10
c:0025 p:0049 s:0092 b:0092 l:001ba8 d:000091 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0024 p:0007 s:0090 b:0090 l:000089 d:000089 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/active_record.rb:6
c:0023 p:0049 s:0086 b:0086 l:001ba8 d:000085 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0022 p:0007 s:0084 b:0084 l:000083 d:000083 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/retry_jobs.rb:50
c:0021 p:0049 s:0072 b:0072 l:001ba8 d:000071 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0020 p:0041 s:0070 b:0070 l:0025c0 d:001ca0 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:11
c:0019 p:0037 s:0067 b:0067 l:000066 d:000066 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/logging.rb:22
c:0018 p:0045 s:0063 b:0063 l:0025c0 d:0025c0 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:7
c:0017 p:0049 s:0057 b:0057 l:001ba8 d:000056 LAMBDA /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0016 p:---- s:0055 b:0055 l:000054 d:000054 FINISH
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC  :call
c:0014 p:0040 s:0050 b:0050 l:001ba8 d:001ba8 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114
c:0013 p:0030 s:0043 b:0043 l:001480 d:002668 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:48
c:0012 p:0021 s:0041 b:0041 l:000a20 d:000a20 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:87
c:0011 p:0110 s:0035 b:0035 l:001480 d:001528 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:47
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 FINISH
c:0009 p:---- s:0027 b:0027 l:000026 d:000026 CFUNC  :call
c:0008 p:---- s:0025 b:0025 l:000024 d:000024 CFUNC  :public_send
c:0007 p:0021 s:0021 b:0021 l:000020 d:000020 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23
c:0006 p:0016 s:0015 b:0015 l:001358 d:000014 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/future.rb:18
c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC  :call
c:0003 p:0021 s:0008 b:0008 l:001198 d:000007 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP   

-- Ruby level backtrace information ----------------------------------------
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48:in `block in create'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/future.rb:18:in `block in initialize'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `dispatch'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `public_send'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:47:in `block in process'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:87:in `stats'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:48:in `block (2 levels) in process'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114:in `invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:7:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/logging.rb:22:in `with_context'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:11:in `block in call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/retry_jobs.rb:50:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/active_record.rb:6:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:10:in `call'
/usr/local/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:11:in `block in call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:49:in `block (3 levels) in process'
/home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:47:in `perform'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6:in `each'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6:in `each'
/home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:50:in `block in perform'
/home/deployer/apps/leveltravel/current/app/actors/operator_search_request_base.rb:25:in `run'
/home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:68:in `perform'
/home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:43:in `process_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/pegast.rb:39:in `parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85:in `samo_parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85:in `each'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:92:in `block in samo_parse_operator_response'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `each'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `upto'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:239:in `block in each'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:93:in `block (2 levels) in samo_parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:161:in `parse_offer'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:193:in `parse_prices'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node.rb:248:in `at_css'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24:in `first'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24:in `[]'

-- C level backtrace information -------------------------------------------
sidekiq 2.9.0 current [1 of 10 busy]() [0x52c3e0]
sidekiq 2.9.0 current [1 of 10 busy]() [0x57cc43]
sidekiq 2.9.0 current [1 of 10 busy](rb_bug+0xb0) [0x57dfe0]
sidekiq 2.9.0 current [1 of 10 busy]() [0x4b9b82]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fd095a5acb0] xml_node.c:169
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/nokogiri.so(+0x112de) [0x7fd090b532de] xml_node_set.c:319
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy]() [0x44ab7e]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy](rb_ary_each+0x52) [0x5423c2]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy](rb_ary_each+0x52) [0x5423c2]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523e8b]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x532932]
sidekiq 2.9.0 current [1 of 10 busy]() [0x532bec]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fd095a52e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd094e3ecbd]
@fenelon
Copy link
Author

fenelon commented Apr 13, 2013

This is a most likely GC bug. Disabling GC makes it work.

I've tried different versions of ruby, libxml and nokogiri. No luck there.

@sthetz
Copy link

sthetz commented Apr 17, 2013

I'm expiriencing the same problem. The issue seems not to be related to Sidekiq and Celluloid: I tried migrating my whole app to Resque, and the problem remained. Tried different versions of ruby, libxml, nokogiri. Nothing helps.

@Wardrop
Copy link

Wardrop commented Apr 17, 2013

If it's random, it might be related to this: http://bugs.ruby-lang.org/issues/8100

Try Ruby 2.0 patched with changeset 39919

(Also reported to affect 1.9.3 I believe)

@sthetz
Copy link

sthetz commented Apr 19, 2013

Latest ruby versions do not solve this issue, I'm still having segfaults.

@fenelon
Copy link
Author

fenelon commented Apr 19, 2013

@Wardrop @sthetz The ruby2.1.0-dev solves the segfault issue on a development mac, but still fails on our Ubuntu server.

@sthetz
Copy link

sthetz commented Apr 19, 2013

Nokogiri::XML::XPath::SyntaxError: NULL context pointer

I just got this instead of a segfault.

@sthetz
Copy link

sthetz commented Apr 19, 2013

@tenderlove @flavorjones @sparklemotion

I finally figured out what the problem was. I have a reproduceable test for segfault: https://github.com/sthetz/nokogiri-segfault

@ender672
Copy link
Member

Thank you for isolating the issue!

I was able to pare it down to this snippet:

require 'nokogiri'
require 'libxml'

loop do
  threads = []
  20.times do
    threads << Thread.new do
      d = Nokogiri::XML '<foo><bar></bar></foo>'
      (d/'bar').each{}
    end
  end

  threads.each { |thread| thread.join }
end

The issue is that libxml-ruby's initializer hooks into libxml2 in a way that is incompatible with Nokogiri.

In libxml-ruby's ext/libxml/ruby_xml_node.c:
https://github.com/xml4r/libxml-ruby/blob/921527901ee68a50292998ad57a03e34e2e81b20/ext/libxml/ruby_xml_node.c#L43-L56

The problem is that this hook is global, and even libxml2 nodes that are managed by Nokogiri end up passing through the rxml_node_deregisterNode() function. This function is only meant to handle libxml-ruby nodes, and it results in memory corruption.

I'm not sure yet what the solution is, and I really have to move the pigs onto new pasture right now.

I'll check back on this tonight. Thanks again for taking the time to isolate the issue.

@ender672
Copy link
Member

CC @cfis

@cfis
Copy link

cfis commented Apr 20, 2013

Interesting. libxml-ruby depends on that callback for memory management, so it can't go away. Perhaps though the callback could do a type check on the ruby object in _private and if its not a libxmlruby object just ignore it?

@ender672
Copy link
Member

Nokogiri stores a VALUE in the _private field for nodes just like libxml-ruby does. My comment about memory corruption was because I mistakenly thought that Nokogiri was putting a custom struct in there.

So I don't see anything obviously wrong, even when rxml_node_deregisterNode() gets called on a node wrapped by Nokogiri.

A type check is likely to make this issue go away, but I have a feeling there is something else going on. I'm having trouble narrowing it down though -- when I make changes to the the test snippet that should be unrelated I can no longer reproduce it. Seems to be timing or GC related (but GC.stress doesn't produce it either).

Still fiddling with this.

@sthetz
Copy link

sthetz commented Apr 21, 2013

@ender672 We were able to get rid of the segfault by disabling GC completely, so yes, it is GC related.

@ender672
Copy link
Member

@sthetz Thanks for pointing that out. Usually GC.stress is good at triggering GC issues, but it doesn't help here.

I usually narrow the crashing/leaking snippet down until it only calls one Nokogiri method and debug it from there, but this one goes away when I try that.

@ender672
Copy link
Member

Here is what I have so far:

  1. The node deregister callback gets called on a node whose _private member points to a Nokogiri Ruby Node object.
  2. rxml_node_deregisterNode() sets the Node object's data pointer to NULL.
  3. The same Ruby Node object gets passed into a Nokogiri method as an argument.
  4. Nokogiri assumes that the data pointer can't be NULL and things break.

The method that I've seen this happen in so far is new() in xml_xpath.c:

/*
* call-seq:
* new(node)
*
* Create a new XPathContext with +node+ as the reference point.
*/
static VALUE new(VALUE klass, VALUE nodeobj)
{
xmlNodePtr node;
xmlXPathContextPtr ctx;
VALUE self;
xmlXPathInit();
Data_Get_Struct(nodeobj, xmlNode, node);
ctx = xmlXPathNewContext(node->doc);
ctx->node = node;
self = Data_Wrap_Struct(klass, 0, deallocate, ctx);
/*rb_iv_set(self, "@xpath_handler", Qnil); */
return self;
}

The parameter nodeobj is the already-freed node.

I gotta go pick up restaurant food scraps. Will be back later.

@ender672
Copy link
Member

I'm narrowing down on the issue. Quick update:

A very similar issue came up four years ago.

The solution at the time was to avoid the libxml-ruby callback by temporarily disabling the node-deregister-callback.

However, that solution was never really complete -- there are many other places in Nokogiri where libxml2 nodes are deregistered.

In 2011, libxml-ruby enabled the callback for native OS threads (for ruby 1.9.x compatibility). The Nokogiri workaround doesn't work in this case, and @sthetz's snippet uses multithreading to trigger the four year old bug again.

I have a hunch that the libxml-ruby callback is just triggering a deeper Nokogiri issue. Still digging.

@ender672
Copy link
Member

I thought the libxml-ruby callback would be benign, but turns out that VALUE pointers are unreliable when used in the free() function of a ruby-wrapped C struct. By the time the free() function is invoked the VALUE pointer may have been recycled.

Here is what triggers the error:

  1. Nokogiri creates node A.
  2. GC mark phase triggers.
    • Node A is unreachable.
  3. Nokogiri creates node B. Ruby reuses the same VALUE pointer as A.
  4. GC sweep phase triggers.
    • Nokogiri frees the libxml2 node associated with A.
    • The libxml-ruby callback is invoked. However the VALUE pointer in the _private field now points to B and the wrong ruby object is cleared.

In order for the libxml-ruby callback to be safe, Nokogiri will have to make sure that every libxml2 node has its _private field unset before we call xmlFree().

@flavorjones
Copy link
Member

Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829 for one example reason why).

I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem.

@fenelon
Copy link
Author

fenelon commented Apr 23, 2013

It does happen in 2.8 as well
On Apr 23, 2013 10:04 PM, "Mike Dalessio" notifications@github.com wrote:

Does this happen with libxml < 2.9.0? If not, then I think we should close
this, as Nokogiri doesn't support 2.9.0 yet (see #829https://github.com/sparklemotion/nokogiri/issues/829for one example reason why).

I'm also not really comfortable hacking Nokogiri to work around
libxml-ruby. It's an old and apparently-unsupported gem.


Reply to this email directly or view it on GitHubhttps://github.com//issues/881#issuecomment-16875547
.

@ender672
Copy link
Member

@flavorjones - It's a problem with GC timing. I won't be adding any hacks to work around this. All attention has been on identifying & understanding the problem.

libxml-ruby is still active and it's included in enough code that I absolutely want to fix this issue. This kind of thing spawns lots of segfault bug reports.

@cfis
Copy link

cfis commented Apr 23, 2013

libxml-ruby - not old and not unsupported (i'm the maintainer).

@ender672
Copy link
Member

@fenelon - are you loading libxml-ruby? A good place to check is your Gemfile.lock.

@ender672
Copy link
Member

So @flavorjones was right and the only way I can think of fixing this was to add yet another workaround in Nokogiri.

How bout a haiku?

It may be a hack
But it is a pretty one
Said the pragmatist

@ender672
Copy link
Member

There is an alternative -- we could raise an exception if libxml-ruby is detected, spit out a warning, allow an environment variable to override, etc.

I lean more towards the commit in the pull request. Nice thing about it is that the cleanup only happens when the libxml-ruby callback is detected.

@mlooney
Copy link

mlooney commented Jan 28, 2014

Any chance of this issue getting fixed any time soon?

@mlooney
Copy link

mlooney commented Jan 28, 2014

issue still present w/ ruby 2.1 & 2.0

@flavorjones
Copy link
Member

#895 was merged. Should fix this issue.

@flavorjones
Copy link
Member

Will be in 1.6.3.rc1 to be released later today.

@typeoneerror
Copy link

still an issue?

/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/lib/nokogiri/xml/sax/push_parser.rb:47: [BUG] Segmentation fault at 0x00000000000000
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]

Do we need to set up Nokogiri to use different versions of libxml2 by changing the paths when we install?

# Nokogiri (1.6.3.rc1)
    ---
    warnings: []
    nokogiri: 1.6.3.rc1
    ruby:
      version: 2.1.2
      platform: x86_64-darwin12.0
      description: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxml2/2.8.0"
      libxslt_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxslt/1.1.28"
      compiled: 2.8.0
      loaded: 2.8.0

@codekitchen
Copy link
Contributor

We are still seeing what looks very much like this same issue with nokogiri 1.6.6.2 and libxml-ruby 2.8.0, a minimal repro is here: https://gist.github.com/codekitchen/2715ddc89e782b3e6c6f

Removing libxml-ruby, even though the repro isn't using it, prevents the crash.

We've decided to just remove libxml-ruby from our app, since we are only using it in a couple small areas and removing it will be very low effort, but I still wanted to report it.

yujinakayama added a commit to increments/libxml-ruby that referenced this issue Sep 9, 2015
yujinakayama added a commit to increments/libxml-ruby that referenced this issue Sep 9, 2015
@Coren
Copy link

Coren commented Oct 9, 2015

Hi,

I'm having this kind of crash, too. It really seems to be an interaction between your 2 libs.

Here is the trace

/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:75:in `find'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `context'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `new'

-- Machine register context ------------------------------------------------
 RIP: 0x00000000105003fd RBP: 0x00000000193d39d0 RSP: 0x00000000193d39b0
 RAX: 0x636e6174736e692d RBX: 0x00000000063bdbb0 RCX: 0x00000000055282e0
 RDX: 0x000000001c565a30 RDI: 0x00000000144438b0 RSI: 0x0000000000000001
  R8: 0x0000000000000000  R9: 0x0000000010646c22 R10: 0x0000000000000000
 R11: 0x0000000011ba4fa0 R12: 0x00000000063bc000 R13: 0x00000000063bcf30
 R14: 0x000000000639bce0 R15: 0x00000000193d3be8 EFL: 0x0000000000000004

-- C level backtrace information -------------------------------------------
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_bugreport+0x4ea) [0x501d30a] vm_dump.c:693
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_bug_context+0xcb) [0x4eb30ab] error.c:425
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(sigsegv+0x3e) [0x4f9145e] signal.c:879
/lib/x86_64-linux-gnu/libpthread.so.0 [0x531f8d0]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x87) [0x105003fd]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeProp+0xb8) [0x104fe019]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreePropList+0x2f) [0x104fdf50]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x148) [0x105004be]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeDoc+0x161) [0x104fc75f]
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(finalize_list+0x51) [0x4ed1021] gc.c:2463
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(gc_finalize_deferred+0x50) [0x4ed2550] gc.c:2500
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_postponed_job_flush+0x133) [0x5024563] vm_trace.c:1572
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_threadptr_execute_interrupts.part.41+0x139) [0x502a7f9] thread.c:1971
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x52e) [0x501046e] vm_eval.c:252
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_yield+0x492) [0x50190f2] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_ary_each+0x52) [0x4e63e22] array.c:1803
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_iterate+0xea) [0x5005c0a] vm_eval.c:1129
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_block_call+0x2b) [0x5005dcb] vm_eval.c:1198
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(enum_to_a+0x38) [0x4ea7968] enum.c:503
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(proc_call+0x52) [0x4ec2452] proc.c:731
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(send_internal+0xd2) [0x5016fc2] vm_eval.c:928
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_fiber_start+0x110) [0x5031c70] cont.c:1263

@flavorjones
Copy link
Member

@Coren - Please open a new issue. This issue has been closed for almost a year and a half. Your problem is unlikely to be the same root cause, so let's track it as a new problem.

Thanks!

@Coren
Copy link

Coren commented Oct 9, 2015

sure, see #1364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants