Improve Stability of Mock APIs (#49518) #49524

original-brownbear · 2019-11-25T08:35:49Z

This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in sun.net.httpserver.ServerImpl.Dispatcher#handleEvent
that will close a TCP/HTTP connection that does not have the eof flag (see sun.net.httpserver.LeftOverInputStream#isEOF)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the ServerImpl stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by ServerImpl and produces the exact kinds of failures we're seeing currently.

Relates #49401, #49429

backport of #49518

This commit ensures that even for requests that are known to be empty body we at least attempt to read one bytes from the request body input stream. This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent` that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`) set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered. This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks weren't fully drained. Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO). I would suggest merging this and closing related issue after verifying that this fixes things on CI. The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently. Relates elastic#49401, elastic#49429

elasticmachine · 2019-11-25T08:35:52Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2019-11-25T08:41:43Z

Jenkins run elasticsearch-ci/2 (random Docker issue)

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 25, 2019

original-brownbear merged commit a5fa86e into elastic:7.x Nov 25, 2019

original-brownbear deleted the 49518-7.x branch November 25, 2019 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Stability of Mock APIs (#49518) #49524

Improve Stability of Mock APIs (#49518) #49524

original-brownbear commented Nov 25, 2019

elasticmachine commented Nov 25, 2019

original-brownbear commented Nov 25, 2019

Improve Stability of Mock APIs (#49518) #49524

Improve Stability of Mock APIs (#49518) #49524

Conversation

original-brownbear commented Nov 25, 2019

elasticmachine commented Nov 25, 2019

original-brownbear commented Nov 25, 2019