Fix crash in Http2ClientSession::release_netvc #4147

shinrich · 2018-08-22T14:09:59Z

We've seen this crash every few days for a while. Finally think I figured out the issue.

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000006c86db in Http2ClientSession::release_netvc (this=0x2ab223d4c380) at ../../../../trafficserver/proxy/http2/Http2ClientSession.h:193
#2  0x00000000005bd53b in ProxyClientSession::handle_api_return (this=0x2ab223d4c380, event=0) at ../../../trafficserver/proxy/ProxyClientSession.cc:211
#3  0x00000000005bd32e in ProxyClientSession::state_api_callout (this=0x2ab223d4c380, event=0, data=0x0) at ../../../trafficserver/proxy/ProxyClientSession.cc:156
#4  0x00000000005bd418 in ProxyClientSession::do_api_callout (this=0x2ab223d4c380, id=TS_HTTP_SSN_CLOSE_HOOK) at ../../../trafficserver/proxy/ProxyClientSession.cc:182
#5  0x00000000006c572a in Http2ClientSession::destroy (this=0x2ab223d4c380) at ../../../../trafficserver/proxy/http2/Http2ClientSession.cc:82
#6  0x00000000006ce480 in Http2ConnectionState::release_stream (this=0x2ab223d4c620, stream=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2ConnectionState.cc:1246
#7  0x00000000006beefb in Http2Stream::destroy (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:716
#8  0x00000000006bd25f in Http2Stream::terminate_if_possible (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:377
#9  0x00000000006bd19d in Http2Stream::transaction_done (this=0x2abe0aae68c0) at ../../../../trafficserver/proxy/http2/Http2Stream.cc:366
#10 0x0000000000669d12 in HttpSM::kill_this (this=0x2abedc914c00) at ../../../../trafficserver/proxy/http/HttpSM.cc:7060
#11 0x000000000065aeeb in HttpSM::main_handler (this=0x2abedc914c00, event=2301, data=0x2abedc915f80) at ../../../../trafficserver/proxy/http/HttpSM.cc:2815
#12 0x000000000083104e in Continuation::dispatchEvent (this=0x2abedc914c00, event=2301, data=0x2abedc915f80) at ../../../../trafficserver/iocore/eventsystem/Continuation.cc:46
#13 0x00000000006b5a94 in HttpTunnel::main_handler (this=0x2abedc915f80, event=104, data=0x2ab4ec0866b0) at ../../../../trafficserver/proxy/http/HttpTunnel.cc:1650
#14 0x0000000000830f67 in Continuation::handleEvent (this=0x2abedc915f80, event=104, data=0x2ab4ec0866b0) at ../../../../trafficserver/iocore/eventsystem/Continuation.cc:33
#15 0x000000000080df14 in read_signal_and_update (event=104, vc=0x2ab4ec086560) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:144
#16 0x000000000080e28f in read_signal_done (event=104, nh=0x2ab19b314cf0, vc=0x2ab4ec086560) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:205
#17 0x000000000080e989 in read_from_net (nh=0x2ab19b314cf0, vc=0x2ab4ec086560, thread=0x2ab19b311010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:355
#18 0x00000000008109b5 in UnixNetVConnection::net_read_io (this=0x2ab4ec086560, nh=0x2ab19b314cf0, lthread=0x2ab19b311010) at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:971
#19 0x0000000000805ffd in NetHandler::waitForActivity (this=0x2ab19b314cf0, timeout=60000000) at ../../../../trafficserver/iocore/net/UnixNet.cc:497
#20 0x00000000008336f4 in EThread::execute_regular (this=0x2ab19b311010) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:273
#21 0x00000000008338bb in EThread::execute (this=0x2ab19b311010) at ../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:326
#22 0x00000000008324f0 in spawn_thread_internal (a=0x1f54f30) at ../../../../trafficserver/iocore/eventsystem/Thread.cc:85
#23 0x00002ab198c12dc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00002ab19994176d in clone () from /lib64/libc.so.6

The stack is confusing when you look at the code be cause it appears that we are calling through a null client_vc, but the line above we check that client_vc is not null. I think the real problem is that client_vc has been closed and frees in the ProxyClientSession::handle_api_return() call down the stack.

If the original event came in on client_vc, this would have been ok because the vc's recursion count would be > 0 and the actual free wouldn't occur until we unwound the stack. But in this case the EOS event came in on the server VC, and this client VC is accessed form the HttpSM shutdown.

Just reordering the calls to vc->do_io_close and this->release_netvc in ProxyClientSession::handle_api_return should fix the issue. But looking at the bigger picture, I don't think that release_netvc is needed at all at this point. All it does is make sure that there is no lingering client session data on the netvc. But it is always called right next to a call to close the netvc, so at this point it adds no value.

I'm putting up this PR, but will be running this change internally for a bit to verify that it does address the issue.

shinrich · 2018-08-27T15:06:20Z

This fix has been running in one of our colos for 4 days and this crash has not appeared again.

zwoop · 2018-08-29T23:33:15Z

I'm going to remove the 7.1.5 Project on this, since going forward, we will need to make a second PR against the 7.1.x for proposed back ports.

zwoop · 2018-08-29T23:33:48Z

@masaori335 @maskit Is this a fix that we want / need for 7.1.x ?

masaori335 · 2018-08-29T23:36:14Z

IMO, we need this for 7.1.x.

zwoop · 2018-08-29T23:41:30Z

Cool. This doesn't cherry-pick cleanly to 7.1.x, can either of you make a PR against the 7.1.x branch for this please?

masaori335 · 2019-02-13T02:06:21Z

We face same crash on 8.0.3 (rc). We need to backport this on 8.0.x.

zwoop · 2019-02-13T15:03:08Z

Cherry-picked to 8.0.x

Fix crash in Http2ClientSession::release_netvc

5a5071f

shinrich added Core Crash labels Aug 22, 2018

shinrich added this to the 9.0.0 milestone Aug 22, 2018

shinrich self-assigned this Aug 22, 2018

shinrich mentioned this pull request Aug 24, 2018

sometimes ua_session comes back empty #4158

Closed

SolidWallOfCode approved these changes Aug 24, 2018

View reviewed changes

shinrich merged commit 507368d into apache:master Aug 27, 2018

masaori335 mentioned this pull request Aug 29, 2018

Fix crash in Http2ClientSession::release_netvc #4174

Merged

zizhong mentioned this pull request Sep 18, 2018

Double free in Http2ConnectionState::release_stream #4258

Closed

masaori335 mentioned this pull request Feb 13, 2019

Fix another ProxySession and NetVC shutdown race crash #4178

Merged

zwoop modified the milestones: 9.0.0, 8.0.3 Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix crash in Http2ClientSession::release_netvc #4147

Fix crash in Http2ClientSession::release_netvc #4147

Uh oh!

shinrich commented Aug 22, 2018

Uh oh!

shinrich commented Aug 27, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

masaori335 commented Aug 29, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

masaori335 commented Feb 13, 2019

Uh oh!

zwoop commented Feb 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix crash in Http2ClientSession::release_netvc #4147

Fix crash in Http2ClientSession::release_netvc #4147

Uh oh!

Conversation

shinrich commented Aug 22, 2018

Uh oh!

shinrich commented Aug 27, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

masaori335 commented Aug 29, 2018

Uh oh!

zwoop commented Aug 29, 2018

Uh oh!

masaori335 commented Feb 13, 2019

Uh oh!

zwoop commented Feb 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants