Skip to content

Conversation

@zwoop
Copy link
Contributor

@zwoop zwoop commented Mar 17, 2020

Randall found this in our production. I'm thinking 1/threshold is a good limit here, such that you can at least calculate a reasonable percentage based on a large enough sample.

@zwoop zwoop added the HTTP/2 label Mar 17, 2020
@zwoop zwoop added this to the 10.0.0 milestone Mar 17, 2020
@zwoop zwoop requested review from masaori335 and maskit March 17, 2020 17:38
@zwoop zwoop self-assigned this Mar 17, 2020
@zwoop
Copy link
Contributor Author

zwoop commented Mar 17, 2020

@masaori335 Now that we're looking at this code, why is this x2 on the threshold?

    if (this->connection_state.get_stream_error_rate() > std::min(1.0, Http2::stream_error_rate_threshold * 2.0)) {

Warning("HTTP/2 session error client_ip=%s session_id=%" PRId64
" closing a connection, because its stream error rate (%f) is too high",
client_ip, connection_id(), this->connection_state.get_stream_error_rate());
" closing a connection, because its stream error rate (%f) exceeded the threshold (%f)",
Copy link
Contributor Author

@zwoop zwoop Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only done to make the Warning() consistent with the other two places we do this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding configured threshold value is fine, but removing "too high" would make it hard to see whether the close was graceful or immediate.

int total = get_stream_requests();
if (total > 0) {

if (total >= (1 / Http2::stream_error_rate_threshold)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is to require a minimum number of samples before we can calculate a reasonably trustworthy rate of failures. The smaller the threshold, the more samples needed. Probably not statistically safe, but this at least avoids the issues where if (in the default configs) any of the first 10 streams has an error, it's enough to close the connection.

@masaori335
Copy link
Contributor

@masaori335 Now that we're looking at this code, why is this x2 on the threshold?

@maskit could you recall? It comes from 9c09dbc.

@masaori335
Copy link
Contributor

This fixes #5195

Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@maskit
Copy link
Member

maskit commented Mar 18, 2020

why is this x2 on the threshold?

It could be 1.5, 3 or whatever, but 5 is probably too big. If stream error rate exceeds the threshold, the connection will be closed gracefully. If stream error rate exceeds 2x of the threshold, the connection will be closed immediately. This is why the error message is slightly different, and the one for 2x says "too high".

I'm ok with this change, but the reason I didn't check a number of requests was that I thought we want to close stupid clients that causes stream error on the first request.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm fine with this change. I hope others are fine as well. When the original code was merged nobody had the questions (or nobody read the code).

@zwoop zwoop requested a review from bryancall March 18, 2020 04:32
@zwoop zwoop merged commit a963331 into apache:master Apr 7, 2020
@zwoop zwoop deleted the DontBeTooAggressiveOnH2StreamFailure branch April 7, 2020 01:50
@zwoop zwoop modified the milestones: 10.0.0, 8.1.0 Apr 7, 2020
@zwoop
Copy link
Contributor Author

zwoop commented Apr 7, 2020

Cherry-picked to 8.1.x
Cherry-picked to v9.0.x branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants