Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Stability of Mock APIs (#49518) #49524

Merged
merged 1 commit into from
Nov 25, 2019

Conversation

original-brownbear
Copy link
Member

This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in sun.net.httpserver.ServerImpl.Dispatcher#handleEvent
that will close a TCP/HTTP connection that does not have the eof flag (see sun.net.httpserver.LeftOverInputStream#isEOF)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the ServerImpl stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by ServerImpl and produces the exact kinds of failures we're seeing currently.

Relates #49401, #49429

backport of #49518

This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent`
that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently.

Relates elastic#49401, elastic#49429
@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/2 (random Docker issue)

@original-brownbear original-brownbear merged commit a5fa86e into elastic:7.x Nov 25, 2019
@original-brownbear original-brownbear deleted the 49518-7.x branch November 25, 2019 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants