-
Notifications
You must be signed in to change notification settings - Fork 848
Add half_close state in Http2ClientSession #1704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@zizhong Would you review this? |
377f7bd to
37208e6
Compare
|
@masaori335 yes, the race condition you mentioned is definitely an issue. We might have a better way to do this. 🤔 |
37208e6 to
792bbb5
Compare
|
Is this a 7.1.x candidate? |
|
How does a session get closed state? It seems like just changing the timing of closing process a bit later to me. How about setting session inactive timeout short after scheduling of sending a GOAWAY frame? If you sent a GOAWAY frame, the session would be inactive eventually. We could assume the timeout as a sign of write completion of the GOAWAY frame. If you want to close a session asap, you can send RST_STREAM frames to all streams on the session to make them inactive immediately. It would be better to use active timeout too just in case. |
proxy/http2/Http2ClientSession.cc
Outdated
| break; | ||
|
|
||
| case VC_EVENT_WRITE_COMPLETE: | ||
| if (this->get_half_close_flag()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the GOAWAY frame is sent, the sender should complete the processing of outstanding streams which are less than the stream identifier sent in the goaway frame. It seems we are not doing that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the specification, there's no guarantee of the completion. The stream identifier in the GOAWAY frame says that streams which have higher stream id than the one in GOAWAY frame are not processed at all.
https://tools.ietf.org/html/rfc7540#section-6.8
The last stream identifier in the GOAWAY frame contains the highest-
numbered stream identifier for which the sender of the GOAWAY frame
might have taken some action on or might yet take action on. All
streams up to and including the identified stream might have been
processed in some way. The last stream identifier can be set to 0 if
no streams were processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. As far as I understand, there're 2 scenarios for sending GOAWAY frame, Graceful Shutdown and Connection Error Handling. I'm sure that it is better to proceed outstanding streams as long as possible for Graceful Shutdown (maybe until active or inactive timeout). Should we do same thing for Connection Error Handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is no guarantee, but we should try to process the outstanding streams as long as possible. Two questions.
- Once a GOAWAY frame is sent, shouldn't we stop accepting new streams?
- should we close the session only when there are no outstanding streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should try that. The point here is how much the "as long as possible" is.
> 1.
We should stop accepting new steams. The reason is we could not increase the last stream id.
Endpoints MUST NOT increase the value they send in the last stream identifier, since the peers might already have retried unprocessed requests on another connection
So we should ignore the frame. Probably it's better to send RST_STREAM frame.
> 2.
We should also close the session by active timeout or inactive timeout. Even if there're outstanding streams, we should not leave the connection for long time after sending GOAWAY frame. Especially when TS is handling Connection Error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, we should stop accepting new streams. I think we can respond with RST_STREAM frame for HEADERS frames newly arrived.
- Yes and no. Not only. Basically we should keep processing streams already open as long as possible, as you commented. However, if one of streams were transferring gigabytes file, it may take long time. In this case we want to close the stream even if it's not completed. This is the reason I mentioned active timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, masaori's comment wasn't showed in my browser. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be careful when we ignore the new incoming frames.
However, any frames that alter connection
state cannot be completely ignored. For instance, HEADERS,
PUSH_PROMISE, and CONTINUATION frames MUST be minimally processed to
ensure the state maintained for header compression is consistent (see
Section 4.3); similarly, DATA frames MUST be counted toward the
connection flow-control window. Failure to process these frames can
cause flow control or header compression state to become
unsynchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny change needs to be made related to the graceful shutdown. When sending the shutdown notice (stream_id == INT_MAX, error==NO_ERROR), we can't schedule session close. We can only schedule it if stream_id < INT_MAX.
|
@masaori335 Where are we about this PR? Ping you because HTTP2 drain feature needs this as a prerequisite. |
|
@masaori335 Nice work! Thanks for the update. |
88c98a6 to
d0531ad
Compare
|
AU check successful! https://ci.trafficserver.apache.org/job/autest-github/310/ |
|
RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/327/ |
|
clang format successful! https://ci.trafficserver.apache.org/job/clang-format-github/314/ |
|
FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2008/ |
|
Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/439/ |
|
Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1901/ |
|
RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/346/ |
|
FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2028/ |
|
Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/459/ |
|
AU check successful! https://ci.trafficserver.apache.org/job/autest-github/330/ |
|
Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1921/ |
|
clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/592/ |
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
|
||
| // Finalize HTTP/2 Connection | ||
| case HTTP2_SESSION_EVENT_FINI: { | ||
| SCOPED_MUTEX_LOCK(lock, this->mutex, this_ethread()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ua_session's mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this because this protects data of Http2ConnectionState.
OTOH, it's worth to consider that adding ua_session's mutex lock. Because cleanup_streams() and release_stream() change data of ua_session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We saw some crashes if the ua stream mutex is not held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidhuagarwal Could you share how to reproduce? Run this patch with #1710?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidhuagarwal Added SCOPED_MUTEX_LOCK(lock, this->ua_session->mutex, this_ethread()); in cleanup_streams() and release_stream(). Could you verify the crash doesn't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masaori335 we saw some crashes under high load when we were sending HTTP2_SESSION_EVENT_FINI without holding the ua_session's mutex. I don't have a reproduction case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidhuagarwal Got it. Thanks.
Http2ClientSession is set half_close state after GOAWAY frame is sent to client. In half_close state, TS doesn't create new HTTP/2 stream.
|
RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/352/ |
|
clang format successful! https://ci.trafficserver.apache.org/job/clang-format-github/339/ |
|
Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/465/ |
|
AU check successful! https://ci.trafficserver.apache.org/job/autest-github/336/ |
|
FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2034/ |
|
Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1927/ |
|
clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/598/ |
| error.msg); | ||
| } | ||
| this->send_goaway_frame(this->latest_streamid_in, error.code); | ||
| this->ua_session->set_half_close_flag(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_half_close_flag is changing the state of ua_session too. Should we hold a mutex for it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ua_session's mutex is already held. Because here is always on call stack from ua_session.
|
Nice work. Two questions:
|
|
@zwoop It's nice to have. But I don't have strong reasons to push this 7.1.0. |
|
Cherry picked to 7.1.0 per @maskit's request. :) |
This could be fix of #1673
Approach
send_goaway_frame()VC_EVENT_WRITE_COMPLETEevent (probably GOAWAY frame is written)Concerns
My concern is that the
VC_EVENT_WRITE_COMPLETEevent is exactly the event of sending GOAWAY frame or not. When some frame is already scheduled to be written (but not written) beforesend_goaway_frame()is called, same issue will be happen?Is there any way to make sure the event is the event of written of GOAWAY frame?
Tests
h2spec (v2.1.0) http2/6.9.1/2 is passed.