Skip to content

Sporadic crash and core dump in Mutex_trylock #7581

@bdgranger

Description

@bdgranger

We're seeing a very sporadic crash and core dump at customer site with production traffic load. In every instance gathered thus far, the location of the segfault is the same:

Core was generated by '/opt/trafficserver/bin/traffic_server --bind_stdout /opt/trafficserver/var/log/'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000007098c2 in Mutex_trylock (t=0x2b521e086010, m=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
287       if (m->thread_holding != t) {

There are various stack trace paths, but all of them eventually end up going through manage_keep_alive_queue before hitting the Mutex_trylock. Most of them have also gone through Http2ConnectionState::release_stream() although one of them goes through InactivityCop::check_inactivity().

We are currently on trafficserver 8.0.7, which is where these traces come from. We do have the code referenced in #4504. We are in the process of merging to the tip of the 8.1.x branch, but thought we'd ask here to see if we can/should expect these issues to be resolved with the Http2 changes in 8.1.x?

Unfortunately, I am unable to fetch the history via vc->history as this appears to have been optimized out by the compiler and we can't run debug code in this environment. We have not yet been able to reproduce on demand in our lab.

I am including samples of various stack traces retrieved. Would appreciate any thoughts!

#0  0x00000000007098c2 in Mutex_trylock (t=0x2b521e086010, m=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
#1  MutexTryLock (t=0x2b521e086010, am=..., this=0x2b5224f59a10)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
#2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
    handle_event=<synthetic pointer>, now=1614298436718203540, vc=<optimized out>, this=0x2b521df7ccf0) at UnixNet.cc:644
#3  manage_keep_alive_queue (this=0x2b521df7ccf0) at UnixNet.cc:622
#4  NetHandler::add_to_keep_alive_queue (this=0x2b521df7ccf0, vc=<optimized out>) at UnixNet.cc:688
#5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2b5598c7b550, stream=<optimized out>)
    at Http2ConnectionState.cc:1350
#6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2b5737147a80) at Http2Stream.cc:749
#7  0x00000000005742f9 in Http2Stream::transaction_done (this=0x2b5737147a80) at Http2Stream.cc:391
#8  0x0000000000505a9c in HttpSM::kill_this (this=this@entry=0x2b570d0e18d0) at HttpSM.cc:6849
#9  0x0000000000506257 in HttpSM::main_handler (this=0x2b570d0e18d0, event=2301, data=0x2b570d0e27d0) at HttpSM.cc:2559
#10 0x00000000005552a1 in handleEvent (data=0x2b570d0e27d0, event=2301, this=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
#11 HttpTunnel::main_handler (this=0x2b570d0e27d0, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1692
#12 0x0000000000676962 in handleEvent (data=0x2b5299961590, event=103, this=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
#13 calluser (event=103, this=0x2b5299961340) at P_CacheInternal.h:628
#14 CacheVC::openWriteMain (this=0x2b5299961340) at CacheWrite.cc:1393
#15 0x0000000000752b52 in handleEvent (data=0x2b5228073720, event=1, this=<optimized out>) at I_Continuation.h:160
#16 EThread::process_event (this=this@entry=0x2b521e086010, e=e@entry=0x2b5228073720, calling_code=1) at UnixEThread.cc:131
#17 0x000000000075341e in EThread::process_queue (this=this@entry=0x2b521e086010,
    NegativeQueue=NegativeQueue@entry=0x2b5224f59e70, ev_count=ev_count@entry=0x2b5224f59e6c,
    nq_count=nq_count@entry=0x2b5224f59e68) at UnixEThread.cc:170
#18 0x0000000000753a08 in EThread::execute_regular (this=0x2b521e086010) at UnixEThread.cc:230
#19 0x000000000075254a in spawn_thread_internal (a=0x1c37920) at Thread.cc:85
#20 0x00002b52159d9e65 in start_thread () from /lib64/libpthread.so.0
#21 0x00002b521670f88d in clone () from /lib64/libc.so.6

=================

#0  0x000000000070a07a in Mutex_trylock (t=0x2b058e939010, m=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
#1  MutexTryLock (t=0x2b058e939010, am=..., this=0x2b05950bcc90)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
#2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
    handle_event=<synthetic pointer>, now=1614215343299903804, vc=<optimized out>, this=0x2b058e93ccf0) at UnixNet.cc:644
#3  NetHandler::manage_keep_alive_queue (this=0x2b058e93ccf0) at UnixNet.cc:622
#4  0x000000000070ac02 in InactivityCop::check_inactivity (this=0x2b05e80008c0, event=<optimized out>, e=<optimized out>)
    at UnixNet.cc:99
#5  0x0000000000752b52 in handleEvent (data=0x1323f80, event=2, this=<optimized out>) at I_Continuation.h:160
#6  EThread::process_event (this=this@entry=0x2b058e939010, e=e@entry=0x1323f80, calling_code=2) at UnixEThread.cc:131
#7  0x0000000000753d63 in EThread::execute_regular (this=0x2b058e939010) at UnixEThread.cc:244
#8  0x000000000075254a in spawn_thread_internal (a=0x131eda0) at Thread.cc:85
#9  0x00002b058af6ee65 in start_thread () from /lib64/libpthread.so.0
#10 0x00002b058bca488d in clone () from /lib64/libc.so.6

===============

#0  0x00000000007098c2 in Mutex_trylock (t=0x2adb1db45010, m=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
#1  MutexTryLock (t=0x2adb1db45010, am=..., this=0x2adb245e8b90)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
#2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
    handle_event=<synthetic pointer>, now=1613916233686290533, vc=<optimized out>, this=0x2adb1db48cf0) at UnixNet.cc:644
#3  manage_keep_alive_queue (this=0x2adb1db48cf0) at UnixNet.cc:622
#4  NetHandler::add_to_keep_alive_queue (this=0x2adb1db48cf0, vc=<optimized out>) at UnixNet.cc:688
#5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2adff4d97610, stream=<optimized out>)
    at Http2ConnectionState.cc:1350
#6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2ae0374c89c0) at Http2Stream.cc:749
#7  0x00000000005784bb in Http2Stream::terminate_if_possible (this=0x2ae0374c89c0) at Http2Stream.cc:391
#8  0x000000000057871c in Http2Stream::main_event_handler (this=0x2ae0374c89c0, event=<optimized out>, edata=0x2ae01953f5a0)
    at Http2Stream.cc:142
#9  0x0000000000752b52 in handleEvent (data=0x2ae01953f5a0, event=104, this=<optimized out>) at I_Continuation.h:160
#10 EThread::process_event (this=this@entry=0x2adb1db45010, e=e@entry=0x2ae01953f5a0, calling_code=104) at UnixEThread.cc:131
#11 0x000000000075341e in EThread::process_queue (this=this@entry=0x2adb1db45010,
    NegativeQueue=NegativeQueue@entry=0x2adb245e8e70, ev_count=ev_count@entry=0x2adb245e8e6c,
    nq_count=nq_count@entry=0x2adb245e8e68) at UnixEThread.cc:170
#12 0x0000000000753a08 in EThread::execute_regular (this=0x2adb1db45010) at UnixEThread.cc:230
#13 0x000000000075254a in spawn_thread_internal (a=0x10beb10) at Thread.cc:85
#14 0x00002adb1594de65 in start_thread () from /lib64/libpthread.so.0
#15 0x00002adb1668388d in clone () from /lib64/libc.so.6

================

#0  0x00000000007098c2 in Mutex_trylock (t=0x2af54b69b010, m=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
#1  MutexTryLock (t=0x2af54b69b010, am=..., this=0x2af55e7051b0)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
#2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
    handle_event=<synthetic pointer>, now=1613927876215693237, vc=<optimized out>, this=0x2af54b69ecf0) at UnixNet.cc:644
#3  manage_keep_alive_queue (this=0x2af54b69ecf0) at UnixNet.cc:622
#4  NetHandler::add_to_keep_alive_queue (this=0x2af54b69ecf0, vc=<optimized out>) at UnixNet.cc:688
#5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2afa08250890, stream=<optimized out>)
    at Http2ConnectionState.cc:1350
#6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2af8a731ffc0) at Http2Stream.cc:749
#7  0x00000000005742f9 in Http2Stream::transaction_done (this=0x2af8a731ffc0) at Http2Stream.cc:391
#8  0x0000000000505a9c in HttpSM::kill_this (this=this@entry=0x2afa245e6d20) at HttpSM.cc:6849
#9  0x0000000000506257 in HttpSM::main_handler (this=0x2afa245e6d20, event=100, data=0x2af8a7320230) at HttpSM.cc:2559
#10 0x000000000057501e in Http2Stream::update_read_request (this=this@entry=0x2af8a731ffc0,
    read_len=read_len@entry=9223372036854775807, call_update=call_update@entry=true, check_eos=check_eos@entry=false)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
#11 0x0000000000575401 in Http2Stream::send_request (this=this@entry=0x2af8a731ffc0, cstate=...) at Http2Stream.cc:183
#12 0x000000000058fbf3 in rcv_headers_frame (cstate=..., frame=...) at Http2ConnectionState.cc:344
#13 0x0000000000591e59 in Http2ConnectionState::main_event_handler (this=0x2afa08250890, event=<optimized out>,
    edata=<optimized out>) at Http2ConnectionState.cc:983
#14 0x0000000000585f6f in handleEvent (data=0x2af55e705720, event=2253, this=0x2afa08250890)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
#15 send_connection_event (cont=cont@entry=0x2afa08250890, event=event@entry=2253, edata=edata@entry=0x2af55e705720)
    at Http2ClientSession.cc:58
#16 0x0000000000586165 in Http2ClientSession::do_complete_frame_read (this=this@entry=0x2afa082505c0)
    at Http2ClientSession.cc:541
#17 0x000000000058650d in Http2ClientSession::state_process_frame_read (this=this@entry=0x2afa082505c0, event=event@entry=100,
    vio=vio@entry=0x2af6b5ade6f8, inside_frame=inside_frame@entry=false) at Http2ClientSession.cc:594
#18 0x0000000000586d01 in Http2ClientSession::state_start_frame_read (this=0x2afa082505c0, event=100, edata=0x2af6b5ade6f8)
    at Http2ClientSession.cc:462
#19 0x0000000000587540 in Http2ClientSession::main_event_handler (this=0x2afa082505c0, event=100, edata=0x2af6b5ade6f8)
    at Http2ClientSession.cc:331
#20 0x00000000007197a2 in handleEvent (data=0x2af6b5ade6f8, event=100, this=<optimized out>)
    at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
#21 read_signal_and_update (vc=0x2af6b5ade560, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:83
#22 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2af6b5ade560, event=event@entry=100) at UnixNetVConnection.cc:1064
#23 0x00000000006e6350 in SSLNetVConnection::net_read_io (this=0x2af6b5ade560, nh=0x2af54b69ecf0, lthread=<optimized out>)
    at SSLNetVConnection.cc:654
#24 0x0000000000706e44 in NetHandler::process_ready_list (this=this@entry=0x2af54b69ecf0) at UnixNet.cc:395
#25 0x000000000070711d in NetHandler::waitForActivity (this=0x2af54b69ecf0, timeout=<optimized out>) at UnixNet.cc:528
#26 0x0000000000753f49 in EThread::execute_regular (this=0x2af54b69b010) at UnixEThread.cc:272
#27 0x000000000075254a in spawn_thread_internal (a=0x2a454d0) at Thread.cc:85
#28 0x00002af546c00e65 in start_thread () from /lib64/libpthread.so.0
#29 0x00002af54793688d in clone () from /lib64/libc.so.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions