Skip to content

Conversation

@masaori335
Copy link
Contributor

This could be fix of #1673

Approach

  1. Add half_close flag in Http2ClientSession like Http1ClientSession.
  2. Set half_close true at end of send_goaway_frame()
  3. Wait VC_EVENT_WRITE_COMPLETE event (probably GOAWAY frame is written)
  4. Close, destroy and free client_vc and ua_session

Concerns

My concern is that the VC_EVENT_WRITE_COMPLETE event is exactly the event of sending GOAWAY frame or not. When some frame is already scheduled to be written (but not written) before send_goaway_frame() is called, same issue will be happen?
Is there any way to make sure the event is the event of written of GOAWAY frame?

Tests

h2spec (v2.1.0) http2/6.9.1/2 is passed.

$ ./h2spec http2/6.9.1/2 -p 4443 -k -t -v
Hypertext Transfer Protocol Version 2 (HTTP/2)
  6. Frame Definitions
    6.9. WINDOW_UPDATE
      6.9.1. The Flow-Control Window
             [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
             [recv] SETTINGS Frame (length:12, flags:0x00, stream_id:0)
             [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
             [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
             [recv] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
             [send] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
             [send] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
             [recv] GOAWAY Frame (length:8, flags:0x00, stream_id:0)
        ✔ 2: Sends multiple WINDOW_UPDATE frames increasing the flow control window to above 2^31-1

Finished in 0.0209 seconds
1 tests, 1 passed, 0 skipped, 0 failed

@masaori335 masaori335 added this to the 7.2.0 milestone Apr 19, 2017
@masaori335 masaori335 requested review from maskit and shinrich April 19, 2017 00:13
@masaori335
Copy link
Contributor Author

@zizhong Would you review this?

@masaori335 masaori335 self-assigned this Apr 19, 2017
@zizhong
Copy link
Member

zizhong commented Apr 19, 2017

@masaori335 yes, the race condition you mentioned is definitely an issue. We might have a better way to do this. 🤔

@zwoop
Copy link
Contributor

zwoop commented Apr 19, 2017

Is this a 7.1.x candidate?

@maskit
Copy link
Member

maskit commented Apr 19, 2017

How does a session get closed state?
We don't need to call SET_HANDLER anymore?
What would be the difference between closed and half-closed?

It seems like just changing the timing of closing process a bit later to me.

How about setting session inactive timeout short after scheduling of sending a GOAWAY frame? If you sent a GOAWAY frame, the session would be inactive eventually. We could assume the timeout as a sign of write completion of the GOAWAY frame. If you want to close a session asap, you can send RST_STREAM frames to all streams on the session to make them inactive immediately. It would be better to use active timeout too just in case.

break;

case VC_EVENT_WRITE_COMPLETE:
if (this->get_half_close_flag()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the GOAWAY frame is sent, the sender should complete the processing of outstanding streams which are less than the stream identifier sent in the goaway frame. It seems we are not doing that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the specification, there's no guarantee of the completion. The stream identifier in the GOAWAY frame says that streams which have higher stream id than the one in GOAWAY frame are not processed at all.

https://tools.ietf.org/html/rfc7540#section-6.8

The last stream identifier in the GOAWAY frame contains the highest-
numbered stream identifier for which the sender of the GOAWAY frame
might have taken some action on or might yet take action on.  All
streams up to and including the identified stream might have been
processed in some way.  The last stream identifier can be set to 0 if
no streams were processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As far as I understand, there're 2 scenarios for sending GOAWAY frame, Graceful Shutdown and Connection Error Handling. I'm sure that it is better to proceed outstanding streams as long as possible for Graceful Shutdown (maybe until active or inactive timeout). Should we do same thing for Connection Error Handling?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no guarantee, but we should try to process the outstanding streams as long as possible. Two questions.

  1. Once a GOAWAY frame is sent, shouldn't we stop accepting new streams?
  2. should we close the session only when there are no outstanding streams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should try that. The point here is how much the "as long as possible" is.

> 1.
We should stop accepting new steams. The reason is we could not increase the last stream id.

Endpoints MUST NOT increase the value they send in the last stream identifier, since the peers might already have retried unprocessed requests on another connection

So we should ignore the frame. Probably it's better to send RST_STREAM frame.

> 2.
We should also close the session by active timeout or inactive timeout. Even if there're outstanding streams, we should not leave the connection for long time after sending GOAWAY frame. Especially when TS is handling Connection Error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidhuagarwal

  1. Yes, we should stop accepting new streams. I think we can respond with RST_STREAM frame for HEADERS frames newly arrived.
  2. Yes and no. Not only. Basically we should keep processing streams already open as long as possible, as you commented. However, if one of streams were transferring gigabytes file, it may take long time. In this case we want to close the stream even if it's not completed. This is the reason I mentioned active timeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, masaori's comment wasn't showed in my browser. :)

Copy link
Member

@zizhong zizhong Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be careful when we ignore the new incoming frames.

However, any frames that alter connection
state cannot be completely ignored. For instance, HEADERS,
PUSH_PROMISE, and CONTINUATION frames MUST be minimally processed to
ensure the state maintained for header compression is consistent (see
Section 4.3); similarly, DATA frames MUST be counted toward the
connection flow-control window. Failure to process these frames can
cause flow control or header compression state to become
unsynchronized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny change needs to be made related to the graceful shutdown. When sending the shutdown notice (stream_id == INT_MAX, error==NO_ERROR), we can't schedule session close. We can only schedule it if stream_id < INT_MAX.

@zizhong zizhong mentioned this pull request Apr 19, 2017
@zwoop zwoop modified the milestones: 7.2.0, 8.0.0 Apr 25, 2017
@zizhong
Copy link
Member

zizhong commented Apr 25, 2017

@masaori335 Where are we about this PR? Ping you because HTTP2 drain feature needs this as a prerequisite.

@masaori335
Copy link
Contributor Author

masaori335 commented Apr 25, 2017

@zizhong Still work in progress. I tried to use inactivity_timeout and found another bug (#1716). Now, it's fixed, so I'm going to change this PR to use inactivity_timeout.

@zizhong
Copy link
Member

zizhong commented Apr 25, 2017

@masaori335 Nice work! Thanks for the update.

@masaori335 masaori335 force-pushed the h2_half_close branch 2 times, most recently from 88c98a6 to d0531ad Compare April 26, 2017 03:09
@masaori335 masaori335 removed the WIP label Apr 26, 2017
@atsci
Copy link

atsci commented Apr 26, 2017

@atsci
Copy link

atsci commented Apr 26, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/327/

@atsci
Copy link

atsci commented Apr 26, 2017

@atsci
Copy link

atsci commented Apr 26, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2008/

@atsci
Copy link

atsci commented Apr 26, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/439/

@atsci
Copy link

atsci commented Apr 26, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1901/

@atsci
Copy link

atsci commented May 1, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/346/

@atsci
Copy link

atsci commented May 1, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2028/

@atsci
Copy link

atsci commented May 1, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/459/

@atsci
Copy link

atsci commented May 1, 2017

@atsci
Copy link

atsci commented May 1, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1921/

@atsci
Copy link

atsci commented May 1, 2017

clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/592/

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.


// Finalize HTTP/2 Connection
case HTTP2_SESSION_EVENT_FINI: {
SCOPED_MUTEX_LOCK(lock, this->mutex, this_ethread());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ua_session's mutex?

Copy link
Contributor Author

@masaori335 masaori335 May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this because this protects data of Http2ConnectionState.
OTOH, it's worth to consider that adding ua_session's mutex lock. Because cleanup_streams() and release_stream() change data of ua_session.

Copy link

@sidhuagarwal sidhuagarwal May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We saw some crashes if the ua stream mutex is not held.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidhuagarwal Could you share how to reproduce? Run this patch with #1710?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidhuagarwal Added SCOPED_MUTEX_LOCK(lock, this->ua_session->mutex, this_ethread()); in cleanup_streams() and release_stream(). Could you verify the crash doesn't happen?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masaori335 we saw some crashes under high load when we were sending HTTP2_SESSION_EVENT_FINI without holding the ua_session's mutex. I don't have a reproduction case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidhuagarwal Got it. Thanks.

Http2ClientSession is set half_close state after GOAWAY frame is sent to client.
In half_close state, TS doesn't create new HTTP/2 stream.
@atsci
Copy link

atsci commented May 2, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/352/

@atsci
Copy link

atsci commented May 2, 2017

@atsci
Copy link

atsci commented May 2, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/465/

@atsci
Copy link

atsci commented May 2, 2017

@atsci
Copy link

atsci commented May 2, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2034/

@atsci
Copy link

atsci commented May 2, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1927/

@atsci
Copy link

atsci commented May 2, 2017

clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/598/

error.msg);
}
this->send_goaway_frame(this->latest_streamid_in, error.code);
this->ua_session->set_half_close_flag(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_half_close_flag is changing the state of ua_session too. Should we hold a mutex for it too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ua_session's mutex is already held. Because here is always on call stack from ua_session.

@zwoop
Copy link
Contributor

zwoop commented May 8, 2017

Nice work. Two questions:

  1. Did we want this for 7.1.0?

  2. I think there's a new coverity issue reported after this PR. Please fix, and if we want this PR for 7.1.0, make sure to mark the fix for 7.1.0 as well.

@masaori335
Copy link
Contributor Author

@zwoop It's nice to have. But I don't have strong reasons to push this 7.1.0.

@masaori335 masaori335 added the Backport Marked for backport for an LTS patch release label May 11, 2017
@masaori335 masaori335 modified the milestones: 7.1.0, 8.0.0 May 11, 2017
@zwoop
Copy link
Contributor

zwoop commented May 11, 2017

Cherry picked to 7.1.0 per @maskit's request. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport Marked for backport for an LTS patch release HTTP/2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants