Skip to content

Conversation

@zizhong
Copy link
Member

@zizhong zizhong commented Apr 19, 2017

This is trying to fix issue #1693.
The idea is adding a timeout when we stop ATS. During the time out, we send the GOAWAY frame when the last stream on the connection has been released. It's to make sure streams with stream_id no less than last_stream_id has been processed. Ideally, we can send GOAWAY while other streams going on. However, currently, ATS lacks the mechanism that after GOAWAY was sent, other streams continue to be processed.
This PR requires PR #1704 so that the client can receive GOAWAY correctly.
I understand this commit can be improved in many ways. Any ideas would be appreciated,

@masaori335 masaori335 added this to the 7.2.0 milestone Apr 19, 2017
ua_session->destroy();
ua_session = nullptr;
} else if (total_client_streams_count == 0 && http2_drain && ua_session && stream) {
send_goaway_frame(stream->get_id(), Http2ErrorCode::HTTP2_ERROR_NO_ERROR);
Copy link
Contributor

@masaori335 masaori335 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the GOAWAY frame to notify Graceful Shutdown to client? If so the stream id should be 2^31 - 1.
RFC says below in section 6.8.

A server that is attempting to gracefully shut down a connection SHOULD send an initial GOAWAY frame with the last stream identifier set to 231-1 and a NO_ERROR code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be 2 GOWAY frames sent during graceful shutdown. @zizhong it seems you are only doing the 2nd one.

  1. stream id - 2^31 - 1 - This signals the client that we are shutting down so that it should stop sending us more streams
  2. After allowing time for any in-flight stream creation (at least one round-trip time), the server can send another GOAWAY frame with an updated last stream identifier. This ensures that a connection can be cleanly shut down without losing requests.

@zizhong
Copy link
Member Author

zizhong commented Apr 20, 2017

Thanks for the comments. @masaori335 @sidhuagarwal I have a more detailed plan as follows:

ShutDownState {
NOT_INITIATED,
INITIATED,
IN_PROGRESS
};

  1. http2_drain the global flag is set when stopping ATS
  2. for each http2client session we have a ShutDownState. when http2_drain is set, and in the main_handler, we check shut_down_initiated is NOT_INITIATED or not. If it is, initiates the shutdown process, send the first GOAWAY and schedule a shut_down_handler which fires the second GOAWAY after at least an RTT.
  3. when ShutDownState is INITIATED or IN_PROGRESS, we need to ignore the incoming data frames.
  4. We need have last_stream_id_processed in the connectionState. When we run the shut_down_handler, send the second GOAWAY with it. Now the shutdown is IN_PROGRESS. After that, last_stream_id_processed can not be updated. We schedule a do_io_close with some timeout.

@zizhong zizhong force-pushed the http2_drain_feature branch from 484f2af to 3630d47 Compare April 20, 2017 23:46
@zizhong
Copy link
Member Author

zizhong commented Apr 20, 2017

@masaori335 @sidhuagarwal Updated the rb according to the design above.

Test result with chrome:

t=209062 [st=10625]    HTTP2_SESSION_GOAWAY
                       --> active_streams = 1
                       --> debug_data = "[0 bytes were stripped]"
                       --> last_accepted_stream_id = 2147483647
                       --> status = 0
                       --> unclaimed_streams = 0
t=209063 [st=10626]    HTTP2_SESSION_RECV_HEADERS
                       --> fin = false
                       --> :status: 200
                           content-type: text/html; charset=utf-8
                           content-length: 70
                           server: ATS/7.2.0
                           date: Thu, 20 Apr 2017 23:39:23 GMT
                           age: 5
                       --> stream_id = 3
t=209064 [st=10627]    HTTP2_SESSION_RECV_DATA
                       --> fin = false
                       --> size = 70
                       --> stream_id = 3
t=209064 [st=10627]    HTTP2_SESSION_UPDATE_RECV_WINDOW
                       --> delta = -70
                       --> window_size = 15728570
t=209064 [st=10627]    HTTP2_SESSION_RECV_DATA
                       --> fin = true
                       --> size = 0
                       --> stream_id = 3
t=209064 [st=10627]    HTTP2_SESSION_GOAWAY
                       --> active_streams = 0
                       --> debug_data = "[0 bytes were stripped]"
                       --> last_accepted_stream_id = 3
                       --> status = 0
                       --> unclaimed_streams = 0
t=209064 [st=10627]    HTTP2_SESSION_CLOSE
                       --> description = "Finished going away"
                       --> net_error = 0 (?)
t=209064 [st=10627]    HTTP2_SESSION_POOL_REMOVE_SESSION
                       --> source_dependency = 197453 (HTTP2_SESSION)
t=209064 [st=10627] -HTTP2_SESSION

Issues not addressed in this commit includes:

  1. After sending GOWAY, we need to schedule a do_io_close() after some timeout. And it should only apply for the GOAWAY with stream_id < INT32_MAX.
  2. updated_last_stream_id does not necessarily equal to latest_streamid_in.
  3. Stop accepting new frames after GOAWAY was sent.
  4. "After allowing time for any in-flight stream creation (at least one round-trip time), the server can send another GOAWAY frame with an updated last stream identifier", how much time should we wait? I'll add a config in records.config for it later.

@zizhong zizhong force-pushed the http2_drain_feature branch 2 times, most recently from f7d56a3 to 40c97ff Compare April 23, 2017 18:51
@zizhong
Copy link
Member Author

zizhong commented Apr 24, 2017

@masaori335 could you review this? Do you agree we should have different behaviors of the first GOAWAY and the second GOAWAY?

@masaori335
Copy link
Contributor

@zizhong Yes, I agree with sending 2 GOAWAY frames to client. I'll take a look.

@masaori335
Copy link
Contributor

[approve ci]

@atsci
Copy link

atsci commented Apr 24, 2017

@atsci
Copy link

atsci commented Apr 24, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/302/

@atsci
Copy link

atsci commented Apr 24, 2017

@atsci
Copy link

atsci commented Apr 24, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/1983/

@atsci
Copy link

atsci commented Apr 24, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/414/

@atsci
Copy link

atsci commented Apr 24, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1875/

@atsci
Copy link

atsci commented Apr 24, 2017

clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/547/

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the configuration name should be renamed at least because it's a sort of interface. I don't want to change it frequently.

As for implementation, I'm OK with it. It can be refactored later.

// ###########
{RECT_CONFIG, "proxy.config.http.connect_ports", RECD_STRING, "443", RECU_DYNAMIC, RR_NULL, RECC_STR, "^(\\*|[[:digit:][:space:]]+)$", RECA_NULL}
,
{RECT_CONFIG, "proxy.config.http.http2_drain_timeout", RECD_INT, "0", RECU_RESTART_TS, RR_NULL, RECC_STR, "^[0-9]+$", RECA_NULL}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be RECU_DYNAMIC?
Also, could you make it more general name or move it under "proxy.config.http2"?

proxy/Main.cc Outdated
signal_crash_handler(signo, info, ctx);
}

if (http2_drain_timeout) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fun of adding http2_something into Main.cc. I think graceful shutdown is not HTTP2 specific and there should be something we can do on other protocols. Could you make it more general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it more general is a better idea. Thanks!

@zwoop zwoop modified the milestones: 7.2.0, 8.0.0 Apr 25, 2017
@zizhong zizhong force-pushed the http2_drain_feature branch from 40c97ff to f2b25e6 Compare April 25, 2017 19:33
@zizhong
Copy link
Member Author

zizhong commented Apr 25, 2017

@maskit I updated the PR and docs. Can you take a look again?
Good idea of making the timeout more general!

:reloadable:

This setting specifies the number of active client connections
for use by :option:`traffic_ctl server restart --drain`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if TS can drain traffic without restart or stop.
In our use case, we want to break it down to some phases like 1) drain traffic, 2) check stats, and 3) restart or stop server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. you mean exposing an API to initiate the shutdown? @mlibbey also gave the same idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is helpful for us:)

@mlibbey
Copy link
Contributor

mlibbey commented Apr 27, 2017

(frequently because we want to do something between drain and restart -- like install a new version of ATS).

@zizhong zizhong force-pushed the http2_drain_feature branch from f2b25e6 to 952d9df Compare April 27, 2017 17:54
@maskit
Copy link
Member

maskit commented Apr 28, 2017

@zizhong If I understand correctly, after setting the timeout, it starts graceful shutdown when ATS receive some signal, right?
Can you also mention how we can trigger graceful shutdown on the documentation?

I'd prefer small commits. I think adding support for --drain can be a separate PR. We should be able to reuse the 2 GOAWAY logic for it.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue that new sessions open during waiting shutdown timeout get unexpected data. It expects HTTP2 preface but it receive GOAWAY with stream id 2^32-1 instead.

@zizhong zizhong force-pushed the http2_drain_feature branch from b9dc101 to 8bf7015 Compare May 8, 2017 17:58
@zizhong zizhong force-pushed the http2_drain_feature branch from 8bf7015 to 0394a1f Compare May 8, 2017 19:52
@zizhong
Copy link
Member Author

zizhong commented May 8, 2017

@maskit when shutting down ATS, I think we need to stop accepting new connections. What is your idea?

@maskit
Copy link
Member

maskit commented May 8, 2017

@zizhong I agree. ATS should close the listening sockets.

I'm not sure it can be made easily. If it's difficult or the change would be big, then I'm fine with just sending GOAWAY frames after H2 preface (and maybe SETTINGS frames?). We can make it better later.

@zizhong
Copy link
Member Author

zizhong commented May 8, 2017

Currently, I added a check in Http2SessionAccept::accept. It can reject any new incoming connection. Is it good enough?

@maskit
Copy link
Member

maskit commented May 9, 2017

Ah, I didn't have that idea. Sounds good, it has a little overhead though. Also, since we used that only for ACL, it may affect some stats values.

@zizhong
Copy link
Member Author

zizhong commented May 9, 2017

@maskit I already updated it in the PR. Can you review it again?

@maskit
Copy link
Member

maskit commented May 9, 2017

Oh, I just found it. Sure.

[approve ci]

@atsci
Copy link

atsci commented May 9, 2017

@atsci
Copy link

atsci commented May 9, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/416/

@atsci
Copy link

atsci commented May 9, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/2099/

@atsci
Copy link

atsci commented May 9, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/528/

@atsci
Copy link

atsci commented May 9, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1993/

@atsci
Copy link

atsci commented May 9, 2017

@atsci
Copy link

atsci commented May 9, 2017

clang-analyzer build successful! https://ci.trafficserver.apache.org/job/clang-analyzer-github/661/

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@maskit maskit merged commit ca664a7 into apache:master May 9, 2017
@maskit maskit modified the milestones: 7.1.0, 8.0.0 Jun 9, 2017
@maskit maskit added the Backport Marked for backport for an LTS patch release label Jun 9, 2017
@masaori335 masaori335 modified the milestones: 7.1.0, 8.0.0, 7.2.0 Feb 14, 2018
@masaori335
Copy link
Contributor

Changed milestone to 8.0.0, because 7.1.0 doesn't have this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport Marked for backport for an LTS patch release HTTP/2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants