Skip to content

Conversation

@zizhong
Copy link
Member

@zizhong zizhong commented Jun 8, 2017

Fix a crash caused by H/2 graceful shutdown.
The issue is when ATS is shutting down and enters Http2ConnectionState::cleanup_streams. It will check is_state_closed(), which needs to return true. The logic needs to be consistent between is_state_closed() and Http2ConnectionState::release_stream.

@maskit @masaori335 Can you please take a look?

@maskit
Copy link
Member

maskit commented Jun 8, 2017

[approve ci]

masaori335
masaori335 previously approved these changes Jun 8, 2017
Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@masaori335 masaori335 added this to the 8.0.0 milestone Jun 8, 2017
@zwoop
Copy link
Contributor

zwoop commented Jun 8, 2017

This only happens when ATS itself is shutting down?

@zizhong
Copy link
Member Author

zizhong commented Jun 8, 2017 via email

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had enough time to review so just leave a comment.
If you looked at other places where we send GOAWAY frame (send_goaway_frame()), we also sends HTTP2_SESSION_EVENT_FINI event , which sets fini_received true. I guess it's the right closing sequence, but I'm not 100% sure.

@masaori335
Copy link
Contributor

It sounds good, but I'm afraid that it start racing of scheduled HTTP2_SESSION_EVENT_FINI vs release_stream().

@zizhong Who is calling release_stream() which make the crash?

@zizhong
Copy link
Member Author

zizhong commented Jun 9, 2017

Http2ConnectionState::cleanup_streams (this=0x2adc74c8ab50) at Http2ConnectionState.cc:1009
Http2ConnectionState::cleanup_streams (this=...) at Http2ConnectionState.cc:1009
destroy (this=...) at Http2ConnectionState.h:160
Http2ClientSession::free (this=...) at Http2ClientSession.cc:134
ProxyClientSession::state_api_callout (this=..., event=...) at ProxyClientSession.cc:142
Http2ConnectionState::release_stream (this=..., stream=...) at Http2ConnectionState.cc:1074
Http2Stream::main_event_handler (this=..., event=..., edata=...) at Http2Stream.cc:127
handleEvent (this=..., e=..., calling_code=...) at I_Continuation.h:153
EThread::process_event (this=..., e=..., calling_code=...) at UnixEThread.cc:148
EThread::execute (this=...) at UnixEThread.cc:202
spawn_thread_internal (a=...) at Thread.cc:86
start_thread () from /lib64/libpthread.so.0
clone () from /lib64/libc.so.6

From the backtrace, we can tell it happens when a stream gets VC_EVENT_EOS.

@masaori335
Copy link
Contributor

Thanks. So it could be race of scheduled HTTP2_SESSION_EVENT_FINI to Http2ConnectionState vs VC_EVENT_EOS to Http2Stream.

@maskit
Copy link
Member

maskit commented Jun 9, 2017

I'd rather schedule HTTP2_SESSION_EVENT_FINI and use fini_received as the only one flag than introduce new state and conditions. The change below works for me. @zizhong What do you think?

diff --git a/proxy/http2/Http2ConnectionState.cc b/proxy/http2/Http2ConnectionState.cc
index 16ed21db7..9326565f5 100644
--- a/proxy/http2/Http2ConnectionState.cc
+++ b/proxy/http2/Http2ConnectionState.cc
@@ -1159,10 +1159,14 @@ Http2ConnectionState::release_stream(Http2Stream *stream)
       }
     }
 
-    if ((fini_received || shutdown_state == IN_PROGRESS) && total_client_streams_count == 0) {
-      // We were shutting down, go ahead and terminate the session
-      ua_session->destroy();
-      ua_session = nullptr;
+    if (total_client_streams_count == 0) {
+      if (fini_received) {
+        // We were shutting down, go ahead and terminate the session
+        ua_session->destroy();
+        ua_session = nullptr;
+      } else if (shutdown_state == IN_PROGRESS) {
+        this_ethread()->schedule_imm_local((Continuation *)this, HTTP2_SESSION_EVENT_FINI);
+      }
     }
   }
 }

Also, it's a nitpick, but I'm not a big fun of setting shutdown_state DONE during the closing process, because it's not done yet. :)

@zwoop
Copy link
Contributor

zwoop commented Jun 9, 2017

@maskit Since this is only on shutdown, we don't need this for 7.1.x do we ?

@maskit
Copy link
Member

maskit commented Jun 9, 2017

@zwoop We need this too if we backport the series of the commits for graceful shutdown.

@maskit maskit added the Crash label Jun 9, 2017
@zizhong
Copy link
Member Author

zizhong commented Jun 9, 2017

@maskit Thanks for the suggestion! I'll update the code.
Looks like we prefer to lie about fini_received == true than shutdown_state == DONE. xD

@zizhong
Copy link
Member Author

zizhong commented Jun 14, 2017

@maskit @masaori335 updated. please review it again. Thanks!

@maskit
Copy link
Member

maskit commented Jun 15, 2017

[approve ci]

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but my approval is not effective on this because I suggested the change. Needs approval from someone else.

@masaori335 masaori335 merged commit 6ecc998 into apache:master Jun 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants