Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-v1.12: back port #9509 http: Fix ASSERT failure and infinite loop when attempting to unset readDisable state on a closed connection #10802

Closed
wants to merge 1 commit into from

Conversation

lambdai
Copy link
Contributor

@lambdai lambdai commented Apr 16, 2020

…eadDisable state on a closed connection. (#9509)

Description: Handle calls to ConnectionImpl::readDisable on Closed connections more gracefully. There are several places under source/common, including ClientConnectionImpl::onMessageComplete, where Connection::readDisable(false) is called on a loop to clear the readDisable state as part of the process of preparing upstream connections for reuse. It is possible to end up in an infinite loop if proxying an HTTP/1 response is framed by connection close, the call to onData triggers readDisable(true), and there is a pending PostIoAction::Close action. The closeSocket call on the upstreamConnection triggers the call to ClientConnectionImpl::onMessageComplete
Risk Level: medium
Testing: unit and integration tests
Docs Changes: n/a
Release Notes: n/a
Fixes #9508

Signed-off-by: Antonio Vicente avd@google.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

@lambdai
Copy link
Contributor Author

lambdai commented Apr 16, 2020

/assign @PiotrSikora

@lambdai
Copy link
Contributor Author

lambdai commented Apr 16, 2020

backport #9509

…eadDisable state on a closed connection. (envoyproxy#9509)

Description: Handle calls to ConnectionImpl::readDisable on Closed connections more gracefully. There are several places under source/common, including ClientConnectionImpl::onMessageComplete, where Connection::readDisable(false) is called on a loop to clear the readDisable state as part of the process of preparing upstream connections for reuse. It is possible to end up in an infinite loop if proxying an HTTP/1 response is framed by connection close, the call to onData triggers readDisable(true), and there is a pending PostIoAction::Close action. The closeSocket call on the upstreamConnection triggers the call to ClientConnectionImpl::onMessageComplete
Risk Level: medium
Testing: unit and integration tests
Docs Changes: n/a
Release Notes: n/a
Fixes envoyproxy#9508

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Contributor Author

lambdai commented Apr 17, 2020

/azp run envoy-presubmit

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 10802 in repo envoyproxy/envoy

@lambdai
Copy link
Contributor Author

lambdai commented Apr 17, 2020

/retest ci/circleci: coverage

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #10802 (comment) was created by @lambdai.

see: more, trace.

@lambdai
Copy link
Contributor Author

lambdai commented Apr 20, 2020

The failing test is a KP: #10287

@lambdai
Copy link
Contributor Author

lambdai commented Apr 20, 2020

Not sure what is missing. The request is not complete because of "[2020-04-20 19:26:30.184][14][debug][client] [source/common/http/codec_client.cc:124] [C0] protocol error: http/1.1 protocol error: HPE_INVALID_EOF_STATE"
The good part is that it is not a inf loop.

Will investigate.

[2020-04-20 19:26:29.182][22][debug][router] [source/common/router/router.cc:724] [C1][S554159968252536995] upstream timeout
[2020-04-20 19:26:29.182][22][debug][router] [source/common/router/router.cc:1564] [C1][S554159968252536995] resetting pool request
[2020-04-20 19:26:29.182][22][debug][client] [source/common/http/codec_client.cc:111] [C2] request reset
[2020-04-20 19:26:29.182][22][debug][connection] [source/common/network/connection_impl.cc:104] [C2] closing data_to_write=0 type=1
[2020-04-20 19:26:29.182][22][debug][connection] [source/common/network/connection_impl.cc:193] [C2] closing socket: 1
[2020-04-20 19:26:29.182][22][debug][client] [source/common/http/codec_client.cc:88] [C2] disconnect. resetting 0 pending requests
[2020-04-20 19:26:29.182][22][debug][pool] [source/common/http/http1/conn_pool.cc:136] [C2] client disconnected, failure reason: 
[2020-04-20 19:26:29.182][22][debug][http] [source/common/http/conn_manager_impl.cc:200] [C1][S554159968252536995] doEndStream() resetting stream
[2020-04-20 19:26:29.182][22][debug][http] [source/common/http/conn_manager_impl.cc:1791] [C1][S554159968252536995] stream reset
[2020-04-20 19:26:29.182][22][debug][connection] [source/common/network/connection_impl.cc:104] [C1] closing data_to_write=0 type=2
[2020-04-20 19:26:29.182][22][debug][connection] [source/common/network/connection_impl.cc:682] [C1] setting delayed close timer with timeout 1000 ms
[2020-04-20 19:26:30.183][22][debug][connection] [source/common/network/connection_impl.cc:671] [C1] triggered delayed close
[2020-04-20 19:26:30.184][22][debug][connection] [source/common/network/connection_impl.cc:193] [C1] closing socket: 1
[2020-04-20 19:26:30.184][22][debug][conn_handler] [source/server/connection_handler_impl.cc:88] [C1] adding to cleanup list
[2020-04-20 19:26:30.184][14][debug][connection] [source/common/network/connection_impl.cc:549] [C0] remote close
[2020-04-20 19:26:30.184][14][debug][connection] [source/common/network/connection_impl.cc:193] [C0] closing socket: 0
[2020-04-20 19:26:30.184][14][debug][client] [source/common/http/codec_client.cc:124] [C0] protocol error: http/1.1 protocol error: HPE_INVALID_EOF_STATE
[2020-04-20 19:26:30.184][14][debug][client] [source/common/http/codec_client.cc:88] [C0] disconnect. resetting 1 pending requests
[2020-04-20 19:26:30.184][14][debug][client] [source/common/http/codec_client.cc:111] [C0] request reset
test/integration/integration_test.cc:244: Failure
Value of: response->complete()
  Actual: false
Expected: true
Stack trace:
  0x1366070: Envoy::IntegrationTest_ResponseFramedByConnectionCloseWithReadLimits_Test::TestBody()
  0x312f6b4: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x312107e: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x310b933: testing::Test::Run()
  0x310c51d: testing::TestInfo::Run()
... Google Test internal frames ...

[2020-04-20 19:26:30.274][14][info][testing] [test/integration/server.cc:221] stopping integration test server
[2020-04-20 19:26:30.279][14][debug][upstream] [source/common/upstream/upstream_impl.cc:250] transport socket match, socket test selected for host with address 127.0.0.1:80
[2020-04-20 19:26:30.279][14][debug][client] [source/common/http/codec_client.cc:31] [C4] connecting
[2020-04-20 19:26:30.279][14][debug][connection] [source/common/network/connection_impl.cc:733] [C4] connecting to 127.0.0.1:32931
[2020-04-20 19:26:30.279][14][debug][connection] [source/common/network/connection_impl.cc:742] [C4] connection in progress
[2020-04-20 19:26:30.280][14][debug][connection] [source/common/network/connection_impl.cc:581] [C4] connected
[2020-04-20 19:26:30.280][14][debug][client] [source/common/http/codec_client.cc:69] [C4] connected
[2020-04-20 19:26:30.280][16][debug][conn_handler] [source/server/connection_handler_impl.cc:335] [C5] new connection
[2020-04-20 19:26:30.280][16][debug][http] [source/common/http/conn_manager_impl.cc:259] [C5] new stream
[2020-04-20 19:26:30.280][16][debug][http] [source/common/http/conn_manager_impl.cc:708] [C5][S7148877827187577247] request headers complete (end_stream=true):
':authority', 'host'
':path', '/quitquitquit'
':method', 'POST'
'content-length', '0'

@stale
Copy link

stale bot commented Apr 28, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 28, 2020
@PiotrSikora
Copy link
Contributor

Not, stale. @lambdai any luck with the failing test?

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 28, 2020
@lambdai
Copy link
Contributor Author

lambdai commented Apr 28, 2020

@PiotrSikora Sorry I have been working on 1.6. I didn't find an obvious error when I looked at this with my fragment time.

@lambdai
Copy link
Contributor Author

lambdai commented Apr 28, 2020

@antoniovicente I need your help. Can you share any light why the test is failing? Any suspicious PR should go with this PR?

@antoniovicente
Copy link
Contributor

/assign @antoniovicente

@antoniovicente
Copy link
Contributor

HPE_INVALID_EOF_STATE is set by the http_parser library. I'm going to guess that the issue may be related to the version of http_parser used by 1.12 or envoy changes to correctly handle HTTP/1.0 responses without content-length.

@lambdai
Copy link
Contributor Author

lambdai commented Apr 29, 2020

HPE_INVALID_EOF_STATE is set by the http_parser library. I'm going to guess that the issue may be related to the version of http_parser used by 1.12 or envoy changes to correctly handle HTTP/1.0 responses without content-length.

Thanks! Since you mention http1.0 while the protocol in log is http1.1, maybe envoy 1.12 need extra steps to set up http1.0. I will run more experiment

@antoniovicente
Copy link
Contributor

antoniovicente commented Apr 30, 2020

The issue is with backend responses without content-length or transfer-encoding:chunked. HTTP/1.1 prefers having transfer-encoding:chunked on responses like this one, but should accept ones without it. In HTTP/1.0 transfer-encoding:chunked is not supported so framing by connection close is the only option if content-length is not available.

My guess is about http_parser version, but there may be some envoy specific changes also. Have you tried updating the version of http_parser dependency? You may need help from people that have been involved with Envoy longer than I have.

@antoniovicente
Copy link
Contributor

The issue is with backend responses without content-length or transfer-encoding:chunked. HTTP/1.1 prefers having transfer-encoding:chunked on responses like this one, but should accept ones without it. In HTTP/1.0 transfer-encoding:chunked is not supported so framing by connection close is the only option if content-length is not available.

My guess is about http_parser version, but there may be some envoy specific changes also. Have you tried updating the version of http_parser dependency? You may need help from people that have been involved with Envoy longer than I have.

Did you try this? Just making sure you're not blocked on me.

@stale
Copy link

stale bot commented May 13, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label May 13, 2020
@stale
Copy link

stale bot commented May 20, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this May 20, 2020
@PiotrSikora
Copy link
Contributor

@lambdai any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants