Skip to content

Conversation

@shinrich
Copy link
Member

We've seen this crash every few days for a while. Finally think I figured out the issue.

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000006c86db in Http2ClientSession::release_netvc (this=0x2ab223d4c380) at ../../../../trafficserver/proxy/http2/Http2ClientSession.h:193
#2  0x00000000005bd53b in ProxyClientSession::handle_api_return (this=0x2ab223d4c380, event=0) at ../../../trafficserver/proxy/ProxyClientSession.cc:211
#3  0x00000000005bd32e in ProxyClientSession::state_api_callout (this=0x2ab223d4c380, event=0, data=0x0) at ../../../trafficserver/proxy/ProxyClientSession.cc:156
#4  0x00000000005bd418 in ProxyClientSession::do_api_callout (this=0x2ab223d4c380, id=TS_HTTP_SSN_CLOSE_HOOK) at ../../../trafficserver/proxy/ProxyClientSession.cc:182
#5  0x00000000006c572a in Http2ClientSession::destroy (this=0x2ab223d4c380) at ../../../../trafficserver/proxy/http2/Http2ClientSession.cc:82
#6  0x00000000006ce480 in Http2ConnectionState::release_stream (this=0x2ab223d4c620, stream=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2ConnectionState.cc:1246
#7  0x00000000006beefb in Http2Stream::destroy (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:716
#8  0x00000000006bd25f in Http2Stream::terminate_if_possible (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:377
#9  0x00000000006bd19d in Http2Stream::transaction_done (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:366
#10 0x0000000000669d12 in HttpSM::kill_this (this=0x2abedc914c00) at ../../../../trafficserver/proxy/http/HttpSM.cc:7060
#11 0x000000000065aeeb in HttpSM::main_handler (this=0x2abedc914c00, event=2301, data=0x2abedc915f80) at ../../../../trafficserver/proxy/http/HttpSM.cc:2815
#12 0x000000000083104e in Continuation::dispatchEvent (this=0x2abedc914c00, event=2301, data=0x2abedc915f80) at ../../../../trafficserver/iocore/eventsystem/Continuation.cc:46
#13 0x00000000006b5a94 in HttpTunnel::main_handler (this=0x2abedc915f80, event=104, data=0x2ab4ec0866b0) at ../../../../trafficserver/proxy/http/HttpTunnel.cc:1650
#14 0x0000000000830f67 in Continuation::handleEvent (this=0x2abedc915f80, event=104, data=0x2ab4ec0866b0) at ../../../../trafficserver/iocore/eventsystem/Continuation.cc:33
#15 0x000000000080df14 in read_signal_and_update (event=104, vc=0x2ab4ec086560) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:144
#16 0x000000000080e28f in read_signal_done (event=104, nh=0x2ab19b314cf0, vc=0x2ab4ec086560) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:205
#17 0x000000000080e989 in read_from_net (nh=0x2ab19b314cf0, vc=0x2ab4ec086560, thread=0x2ab19b311010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:355
#18 0x00000000008109b5 in UnixNetVConnection::net_read_io (this=0x2ab4ec086560, nh=0x2ab19b314cf0, lthread=0x2ab19b311010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:971
#19 0x0000000000805ffd in NetHandler::waitForActivity (this=0x2ab19b314cf0, timeout=60000000) at ../../../../trafficserver/iocore/net/UnixNet.cc:497
#20 0x00000000008336f4 in EThread::execute_regular (this=0x2ab19b311010) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:273
#21 0x00000000008338bb in EThread::execute (this=0x2ab19b311010) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:326
#22 0x00000000008324f0 in spawn_thread_internal (a=0x1f54f30) at ../../../../trafficserver/iocore/eventsystem/Thread.cc:85
#23 0x00002ab198c12dc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00002ab19994176d in clone () from /lib64/libc.so.6

The stack is confusing when you look at the code be cause it appears that we are calling through a null client_vc, but the line above we check that client_vc is not null. I think the real problem is that client_vc has been closed and frees in the ProxyClientSession::handle_api_return() call down the stack.

If the original event came in on client_vc, this would have been ok because the vc's recursion count would be > 0 and the actual free wouldn't occur until we unwound the stack. But in this case the EOS event came in on the server VC, and this client VC is accessed form the HttpSM shutdown.

Just reordering the calls to vc->do_io_close and this->release_netvc in ProxyClientSession::handle_api_return should fix the issue. But looking at the bigger picture, I don't think that release_netvc is needed at all at this point. All it does is make sure that there is no lingering client session data on the netvc. But it is always called right next to a call to close the netvc, so at this point it adds no value.

I'm putting up this PR, but will be running this change internally for a bit to verify that it does address the issue.

@shinrich shinrich added this to the 9.0.0 milestone Aug 22, 2018
@shinrich shinrich self-assigned this Aug 22, 2018
@shinrich
Copy link
Member Author

This fix has been running in one of our colos for 4 days and this crash has not appeared again.

@shinrich shinrich merged commit 507368d into apache:master Aug 27, 2018
@zwoop
Copy link
Contributor

zwoop commented Aug 29, 2018

I'm going to remove the 7.1.5 Project on this, since going forward, we will need to make a second PR against the 7.1.x for proposed back ports.

@zwoop
Copy link
Contributor

zwoop commented Aug 29, 2018

@masaori335 @maskit Is this a fix that we want / need for 7.1.x ?

@masaori335
Copy link
Contributor

IMO, we need this for 7.1.x.

@zwoop
Copy link
Contributor

zwoop commented Aug 29, 2018

Cool. This doesn't cherry-pick cleanly to 7.1.x, can either of you make a PR against the 7.1.x branch for this please?

@masaori335
Copy link
Contributor

We face same crash on 8.0.3 (rc). We need to backport this on 8.0.x.

@zwoop
Copy link
Contributor

zwoop commented Feb 13, 2019

Cherry-picked to 8.0.x

@zwoop zwoop modified the milestones: 9.0.0, 8.0.3 Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants