-
Notifications
You must be signed in to change notification settings - Fork 848
Don't be overly aggressive on stream failures and closing #6525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't be overly aggressive on stream failures and closing #6525
Conversation
|
@masaori335 Now that we're looking at this code, why is this x2 on the threshold? |
| Warning("HTTP/2 session error client_ip=%s session_id=%" PRId64 | ||
| " closing a connection, because its stream error rate (%f) is too high", | ||
| client_ip, connection_id(), this->connection_state.get_stream_error_rate()); | ||
| " closing a connection, because its stream error rate (%f) exceeded the threshold (%f)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only done to make the Warning() consistent with the other two places we do this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding configured threshold value is fine, but removing "too high" would make it hard to see whether the close was graceful or immediate.
| int total = get_stream_requests(); | ||
| if (total > 0) { | ||
|
|
||
| if (total >= (1 / Http2::stream_error_rate_threshold)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this is to require a minimum number of samples before we can calculate a reasonably trustworthy rate of failures. The smaller the threshold, the more samples needed. Probably not statistically safe, but this at least avoids the issues where if (in the default configs) any of the first 10 streams has an error, it's enough to close the connection.
|
|
This fixes #5195 |
masaori335
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
It could be 1.5, 3 or whatever, but 5 is probably too big. If stream error rate exceeds the threshold, the connection will be closed gracefully. If stream error rate exceeds 2x of the threshold, the connection will be closed immediately. This is why the error message is slightly different, and the one for 2x says "too high". I'm ok with this change, but the reason I didn't check a number of requests was that I thought we want to close stupid clients that causes stream error on the first request. |
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm fine with this change. I hope others are fine as well. When the original code was merged nobody had the questions (or nobody read the code).
|
Cherry-picked to 8.1.x |
Randall found this in our production. I'm thinking 1/threshold is a good limit here, such that you can at least calculate a reasonable percentage based on a large enough sample.