Skip to content

Conversation

@dmorilha
Copy link

Program terminated with signal SIGSEGV, Segmentation fault.
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
1186	      ua_session->destroy();
[Current thread is 1 (Thread 0x7f224975c700 (LWP 20146))]
(gdb) bt
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
#1  0x00000000006197a5 in Http2Stream::destroy (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:719
#2  0x0000000000612976 in Http2Stream::terminate_if_possible (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:384
#3  Http2Stream::transaction_done (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:373
#4  0x00000000005bb027 in HttpSM::kill_this (this=this@entry=0x7f2228a21620) at /opt/src/trafficserver/proxy/http/HttpSM.cc:6895
#5  0x00000000005bc1ad in HttpSM::main_handler (this=0x7f2228a21620, event=2301, data=0x7f2228a229c0) at /opt/src/trafficserver/proxy/http/HttpSM.cc:2640
#6  0x0000000000606751 in Continuation::handleEvent (data=0x7f2228a229c0, event=2301, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#7  HttpTunnel::main_handler (this=0x7f2228a229c0, event=<optimized out>, data=<optimized out>) at /opt/src/trafficserver/proxy/http/HttpTunnel.cc:1651
#8  0x0000000000755a6e in Continuation::handleEvent (data=<optimized out>, event=<optimized out>, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#9  CacheVC::calluser (event=<optimized out>, this=<optimized out>) at /opt/src/trafficserver/iocore/cache/P_CacheInternal.h:646
#10 CacheVC::openReadMain (this=0x7f223ece8980) at /opt/src/trafficserver/iocore/cache/CacheRead.cc:712
#11 0x00000000006da481 in Continuation::handleEvent (data=0x0, event=1, this=0x7f223ece8980) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#12 CacheVC::callcont (this=0x7f223ece8980, event=<optimized out>) at /opt/src/trafficserver/iocore/cache/P_CacheInternal.h:665
#13 0x000000000075c1a1 in CacheVC::openReadStartHead (this=0x7f223ece8980, event=<optimized out>, e=0x0) at /opt/src/trafficserver/iocore/cache/CacheRead.cc:1159
#14 0x0000000000731185 in Continuation::handleEvent (data=0x0, event=3900, this=0x7f223ece8980) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#15 CacheVC::handleReadDone (this=0x7f223ece8980, event=<optimized out>, e=<optimized out>) at /opt/src/trafficserver/iocore/cache/Cache.cc:2463
#16 0x00000000006d90f3 in Continuation::handleEvent (data=<optimized out>, event=3900, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#17 AIOCallbackInternal::io_complete (this=<optimized out>, event=<optimized out>, data=<optimized out>) at /opt/src/trafficserver/iocore/aio/P_AIO.h:116
#18 0x0000000000806a55 in Continuation::handleEvent (data=0x7f21252a8dc0, event=1, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#19 EThread::process_event (this=this@entry=0x7f2249c67010, e=e@entry=0x7f21252a8dc0, calling_code=1) at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:140
#20 0x00000000008072b6 in EThread::process_queue (this=this@entry=0x7f2249c67010, NegativeQueue=NegativeQueue@entry=0x7f224975bea0)
    at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:175
#21 0x00000000008076db in EThread::execute_regular (this=0x7f2249c67010) at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
#22 0x00000000008064b5 in spawn_thread_internal (a=0x2c83d70) at /opt/src/trafficserver/iocore/eventsystem/Thread.cc:84
#23 0x00007f224f8ad6ba in start_thread (arg=0x7f224975c700) at pthread_create.c:333
#24 0x00007f224eb3241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) f 0
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
1186	      ua_session->destroy();
(gdb) list
1181	
1182	    if (fini_received && total_client_streams_count == 0) {
1183	      // We were shutting down, go ahead and terminate the session
1184	      // this is a member of Http2ConnectionState and will be freed
1185	      // when ua_session is destroyed
1186	      ua_session->destroy();
1187	
1188	      // Can't do this because we just destroyed right here ^,
1189	      // or we can use a local variable to do it.
1190	      // ua_session = nullptr;
(gdb) print ua_session
$1 = (Http2ClientSession *) 0x0
(gdb) info locals
lock = {m = {m_ptr = 0x7f21a5f7c370}, locked_p = true}
(gdb) print *lock.m.m_ptr
$2 = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop = 0x80af70 <vtable for ProxyMutex+16>}, m_refcount = 3}, the_mutex = {__data = {__lock = 1, __count = 0, __owner = 20146, 
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000\262N\000\000\001", '\000' <repeats 26 times>, 
    __align = 1}, thread_holding = 0x7f2249c67010, nthread_holding = 3}
(gdb) print *this
$3 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x842b40 <vtable for Http2ConnectionState+16>}, handler = (int (Continuation::*)(Continuation * const, int, 
    void *)) 0x62a000 <Http2ConnectionState::state_closed(int, void*)>, mutex = {m_ptr = 0x0}, link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}, control_flags = {raw_flags = 0}}, 
  ua_session = 0x0, local_hpack_handle = 0x7f20fcd0c930, remote_hpack_handle = 0x7f20fcb91260, dependency_tree = 0x7f20fcc73550, server_settings = {settings = {4096, 1, 150, 1048576, 16384, 
      4294967295}}, client_settings = {settings = {65536, 1, 1000, 6291456, 16384, 4294967295}}, client_rwnd = 15714466, server_rwnd = 1048576, 
  stream_list = {<DLL<Http2Stream, Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, latest_streamid_in = 1, latest_streamid_out = 0, stream_requests = 1, client_streams_in_count = 0, 
  client_streams_out_count = 0, total_client_streams_count = 0, continued_stream_id = 0, continued_buffer = {iov_base = 0x0, iov_len = 0}, _scheduled = false, fini_received = true, 
  recursion = 0, fini_event = 0x0}
(gdb) 

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
1186	      ua_session->destroy();
[Current thread is 1 (Thread 0x7f224975c700 (LWP 20146))]
(gdb) bt
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
#1  0x00000000006197a5 in Http2Stream::destroy (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:719
#2  0x0000000000612976 in Http2Stream::terminate_if_possible (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:384
#3  Http2Stream::transaction_done (this=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2Stream.cc:373
#4  0x00000000005bb027 in HttpSM::kill_this (this=this@entry=0x7f2228a21620) at /opt/src/trafficserver/proxy/http/HttpSM.cc:6895
#5  0x00000000005bc1ad in HttpSM::main_handler (this=0x7f2228a21620, event=2301, data=0x7f2228a229c0) at /opt/src/trafficserver/proxy/http/HttpSM.cc:2640
#6  0x0000000000606751 in Continuation::handleEvent (data=0x7f2228a229c0, event=2301, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#7  HttpTunnel::main_handler (this=0x7f2228a229c0, event=<optimized out>, data=<optimized out>) at /opt/src/trafficserver/proxy/http/HttpTunnel.cc:1651
#8  0x0000000000755a6e in Continuation::handleEvent (data=<optimized out>, event=<optimized out>, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#9  CacheVC::calluser (event=<optimized out>, this=<optimized out>) at /opt/src/trafficserver/iocore/cache/P_CacheInternal.h:646
#10 CacheVC::openReadMain (this=0x7f223ece8980) at /opt/src/trafficserver/iocore/cache/CacheRead.cc:712
#11 0x00000000006da481 in Continuation::handleEvent (data=0x0, event=1, this=0x7f223ece8980) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#12 CacheVC::callcont (this=0x7f223ece8980, event=<optimized out>) at /opt/src/trafficserver/iocore/cache/P_CacheInternal.h:665
#13 0x000000000075c1a1 in CacheVC::openReadStartHead (this=0x7f223ece8980, event=<optimized out>, e=0x0) at /opt/src/trafficserver/iocore/cache/CacheRead.cc:1159
#14 0x0000000000731185 in Continuation::handleEvent (data=0x0, event=3900, this=0x7f223ece8980) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#15 CacheVC::handleReadDone (this=0x7f223ece8980, event=<optimized out>, e=<optimized out>) at /opt/src/trafficserver/iocore/cache/Cache.cc:2463
#16 0x00000000006d90f3 in Continuation::handleEvent (data=<optimized out>, event=3900, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#17 AIOCallbackInternal::io_complete (this=<optimized out>, event=<optimized out>, data=<optimized out>) at /opt/src/trafficserver/iocore/aio/P_AIO.h:116
#18 0x0000000000806a55 in Continuation::handleEvent (data=0x7f21252a8dc0, event=1, this=<optimized out>) at /opt/src/trafficserver/iocore/eventsystem/I_Continuation.h:152
#19 EThread::process_event (this=this@entry=0x7f2249c67010, e=e@entry=0x7f21252a8dc0, calling_code=1) at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:140
#20 0x00000000008072b6 in EThread::process_queue (this=this@entry=0x7f2249c67010, NegativeQueue=NegativeQueue@entry=0x7f224975bea0)
    at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:175
#21 0x00000000008076db in EThread::execute_regular (this=0x7f2249c67010) at /opt/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
#22 0x00000000008064b5 in spawn_thread_internal (a=0x2c83d70) at /opt/src/trafficserver/iocore/eventsystem/Thread.cc:84
#23 0x00007f224f8ad6ba in start_thread (arg=0x7f224975c700) at pthread_create.c:333
#24 0x00007f224eb3241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) f 0
#0  Http2ConnectionState::release_stream (this=this@entry=0x7f22381730e0, stream=stream@entry=0x7f22381a6b80) at /opt/src/trafficserver/proxy/http2/Http2ConnectionState.cc:1186
1186	      ua_session->destroy();
(gdb) list
1181	
1182	    if (fini_received && total_client_streams_count == 0) {
1183	      // We were shutting down, go ahead and terminate the session
1184	      // this is a member of Http2ConnectionState and will be freed
1185	      // when ua_session is destroyed
1186	      ua_session->destroy();
1187	
1188	      // Can't do this because we just destroyed right here ^,
1189	      // or we can use a local variable to do it.
1190	      // ua_session = nullptr;
(gdb) print ua_session
$1 = (Http2ClientSession *) 0x0
(gdb) info locals
lock = {m = {m_ptr = 0x7f21a5f7c370}, locked_p = true}
(gdb) print *lock.m.m_ptr
$2 = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop = 0x80af70 <vtable for ProxyMutex+16>}, m_refcount = 3}, the_mutex = {__data = {__lock = 1, __count = 0, __owner = 20146, 
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000\262N\000\000\001", '\000' <repeats 26 times>, 
    __align = 1}, thread_holding = 0x7f2249c67010, nthread_holding = 3}
(gdb) print *this
$3 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x842b40 <vtable for Http2ConnectionState+16>}, handler = (int (Continuation::*)(Continuation * const, int, 
    void *)) 0x62a000 <Http2ConnectionState::state_closed(int, void*)>, mutex = {m_ptr = 0x0}, link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}, control_flags = {raw_flags = 0}}, 
  ua_session = 0x0, local_hpack_handle = 0x7f20fcd0c930, remote_hpack_handle = 0x7f20fcb91260, dependency_tree = 0x7f20fcc73550, server_settings = {settings = {4096, 1, 150, 1048576, 16384, 
      4294967295}}, client_settings = {settings = {65536, 1, 1000, 6291456, 16384, 4294967295}}, client_rwnd = 15714466, server_rwnd = 1048576, 
  stream_list = {<DLL<Http2Stream, Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, latest_streamid_in = 1, latest_streamid_out = 0, stream_requests = 1, client_streams_in_count = 0, 
  client_streams_out_count = 0, total_client_streams_count = 0, continued_stream_id = 0, continued_buffer = {iov_base = 0x0, iov_len = 0}, _scheduled = false, fini_received = true, 
  recursion = 0, fini_event = 0x0}
(gdb) 
```
@bryancall bryancall added this to the 7.1.5 milestone Aug 23, 2018
ua_session->destroy();

if (ua_session) {
ua_session->destroy();
Copy link
Contributor

@bryancall bryancall Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not seeing this issue in our 7.1.x or in 8.0.x. Makes sense to not destroy if ua_session is nullptr.

// when ua_session is destroyed
ua_session->destroy();

if (ua_session) {
Copy link
Contributor

@masaori335 masaori335 Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird because ua_session is already null-checked in above (line # 1169).

@shinrich
Copy link
Member

I have seem some similar things later on in the shutdown logic where the stack shows null, and it seems that client_vc has been nulled. #4147. In that case the client_vc had been closed without updating the reference to it in the Http2ClientSession object. Not the same thing here, but there are some odd races going on.

I agree @masaori335's observation that just doing another null check is unlikely to help, since we have already done the null check in the current line of execution.

I've spent this week chasing down a couple race conditions. After they have baked for the weekend, I will create PR's or update existing WIP PRs.

@dmorilha
Copy link
Author

Sounds promising @shinrich

@zwoop
Copy link
Contributor

zwoop commented Aug 29, 2018

Is this only for 7.1.x ? Is there a corresponding commit on master?

@dmorilha
Copy link
Author

I think Susan ruled this as useless, I still face random crashes with 7.1.x related with closing sessions

@zwoop
Copy link
Contributor

zwoop commented Aug 29, 2018

Should we close this then ?

@dmorilha dmorilha closed this Aug 29, 2018
@dmorilha dmorilha deleted the patch-4 branch August 29, 2018 22:49
@dmorilha
Copy link
Author

done :)

@zwoop zwoop removed this from the 7.1.5 milestone Oct 11, 2018
@zizhong
Copy link
Member

zizhong commented Oct 12, 2018

Seems we have multiple threads calling into Http2ConnectionState::release_stream().
The first one will acquire the lock. The following will wait on the lock. The first one will set the ua_session to nullptr, free the ua_session and then done.
The second thread will acquire the lock at that time but ua_session is null so that ATS will crash.

I think we need this merged unless we figured how to prevent multiple threads calling into this function.

@masaori335
Copy link
Contributor

That scinario explains why ua_session become nullptr after the null check. But, in the first place, release_stream should not be called by multiple thread simultaneously. IMO, all callers of the function should acquire the mutex lock of Http2ConnectionState before calling it.

@masaori335
Copy link
Contributor

@zizhong It looks like all release_stream() is called after mutex of Http2ConnectionState is acquired ( on latest 7.1.x branch ).

  1. Http2ClientSession.cc:283

    SCOPED_MUTEX_LOCK(lock, this->connection_state.mutex, this_ethread());
    this->connection_state.release_stream(nullptr);

  2. Http2ConnectionState.cc:887

    SCOPED_MUTEX_LOCK(lock, this->mutex, this_ethread());
    ink_assert(this->fini_received == false);
    this->fini_received = true;
    cleanup_streams();
    SET_HANDLER(&Http2ConnectionState::state_closed);
    this->release_stream(nullptr);

  3. Http2Stream.cc:714

    SCOPED_MUTEX_LOCK(lock, h2_parent->connection_state.mutex, this_ethread());
    // Make sure the stream is removed from the stream list and priority tree
    // In many cases, this has been called earlier, so this call is a no-op
    h2_parent->connection_state.delete_stream(this);
    // Update session's stream counts, so it accurately goes into keep-alive state
    h2_parent->connection_state.release_stream(this);

@zizhong
Copy link
Member

zizhong commented Oct 15, 2018

@masaori335 Thanks for pointing it out. So what really happened would be much more complicated than that.
Let's say thread A and thread B are calling from different streams on a connection_state arriving Http2Stream.cc:708. Thread A gets the lock and freed the Http2ClientSession object. Then the object is reused and reinitialized with a new mutex. Then thread B gets the lock and call into Http2ConnectionState::release_stream().
At that moment, any other thread can call into Http2ConnectionState::release_stream(), too.

I know we have more locking preventing the above situation. But this use-after-free beast might happen and bypass these locking, too.

@masaori335
Copy link
Contributor

@zizhong Thanks for details. Do you have ASan report of the use-after-free? or something similar? The situations is still bit unclear. I'd like to know the two threads have same stream or not.

@masaori335
Copy link
Contributor

BTW, if this changes can save the situation, I'm fine with reopen and land this for workaround fix.

@zizhong
Copy link
Member

zizhong commented Oct 17, 2018

@masaori335 no, I don't have ASAN report on this. The above hypothesis needs to be verified.
My team is testing this patch for several days. Haven't seen new issues so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants