Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy issuing 503's intermittently #15285

Closed
chrismckean opened this issue Jul 2, 2019 · 7 comments
Closed

Envoy issuing 503's intermittently #15285

chrismckean opened this issue Jul 2, 2019 · 7 comments

Comments

@chrismckean
Copy link

Bug description

Hi All,

@Stono @martinhighton and myself have been looking into an issue with some of our K8 applications getting 503 errors. We're seeing roughly 5-10 an hour between the 2 applications we've been targeting in this investigation. I have obtained the istio debug logs and some packet captures. From the captures, you can see that the source application is the one sending the reset (below). The applications talking aren't particularly busy, so I'm guessing there's some sort of connection timeout issue.

image

I've looked at the debug logs for a while and I can't figure out how to read them. I've not managed to find a lot of documentation on them online. I'm assuming that the 'Cxx' value is a connection. So, all of the entries under, say, C22 relate to one connection between the source and destination envoy? If that is the case should the connection number on the destination logs match the source logs??

There's not a lot to see from the packet capture. Just that line above. Here are what I believe to be the relevant debug logs from the istio container

Source

[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router]
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:393] [C5][S4000182525086368835] router decoding headers:
':authority', 'app.dest.app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'https'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxx-xxx-xxxx'
'x-envoy-decorator-operation', 'app.dest.app.svc.cluster.local:80/*'
'x-istio-attributes', 'CiEKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIFEgNhcHAKMwodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USEhIQbG9jYXRpb24tc2VydmljZQpNCgpzb3VyY2UudWlkEj8SPWt1YmVybmV0ZXM6Ly9tb3Zlcy1sb2dpc3RpY3MtNmY1Yzk0NGM1OS1qZzRtcC5tb3Zlcy1sb2dpc3RpY3MKRAoYZGVzdGluYXRpb24uc2VydmljZS5ob3N0EigSJmFwcC5sb2NhdGlvbi1zZXJ2aWNlLnN2Yy5jbHVzdGVyLmxvY2FsCkIKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEicSJWlzdGlvOi8vbG9jYXRpb24tc2VydmljZS9zZXJ2aWNlcy9hcHA='
'x-envoy-expected-rq-timeout-ms', '10000'
'x-b3-traceid', 'xxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-sampled', '0'

[2019-06-28 13:02:36.789][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C22] using existing connection
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.791][36][debug][router] [external/envoy/source/common/router/router.cc:754] [C5][S4000182525086368835] upstream headers complete: end_stream=false
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:932] [C5][S4000182525086368835] performing retry
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:1156] [C5][S4000182525086368835] resetting pool request
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C22] request reset
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C22] closing data_to_write=0 type=1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C22] closing socket: 1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C22] SSL shutdown: rc=0
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C22] 
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C22] disconnect. resetting 0 pending requests
[2019-06-28 13:02:36.792][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C22] client disconnected, failure reason: 
[2019-06-28 13:02:36.803][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:88] creating a new connection

Destination

2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:393] [C21][S3699665653477458718] router decoding headers:
':authority', 'app.dest-app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'http'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxxxxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxxxx-xxxxxxx-xxxxxxx'
'content-length', '0'
'x-forwarded-client-cert', 'By=spiffe://cluster.local/ns/dest-app/sa/default;Hash=76bb5f18ded8f10606e5f67fe910ddf377a8ffa51f50a7ae180dad792ac7dde2;Subject="";URI=spiffe://cluster.local/ns/source-app/sa/default'
'x-b3-traceid', 'xxxxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-parentspanid', 'xxxxxxxxxxxxx'
'x-b3-sampled', '0'

[2019-06-28 13:02:36.790][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C26] using existing connection
[2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:1210] [C21][S3699665653477458718] pool ready
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C26] remote close
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C26] closing socket: 0
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C26] disconnect. resetting 1 pending requests
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C26] request reset
[2019-06-28 13:02:36.791][37][debug][router] [external/envoy/source/common/router/router.cc:671] [C21][S3699665653477458718] upstream reset: reset reason connection termination
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C21][S3699665653477458718] Sending local reply with details upstream_reset_before_response_started{connection termination}
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1329] [C21][S3699665653477458718] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '95'
'content-type', 'text/plain'
'date', 'Fri, 28 Jun 2019 13:02:36 GMT'
'server', 'istio-envoy'

[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/authn/http_filter.cc:48] Called AuthenticationFilter : onDestroy
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
[2019-06-28 13:02:36.791][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C26] client disconnected, failure reason: 
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
[2019-06-28 13:02:36.791][37][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac
[2019-06-28 13:02:36.791][37][debug][filter] [src/istio/control/client_context_base.cc:140] Report attributes: attributes {
  key: "connection.mtls"
  value {
    bool_value: true
  }
}
attributes {
  key: "connection.requested_server_name"
  value {
    string_value: "outbound_.80_._.app.dest-a
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C21] remote close
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C21] closing socket: 0
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C21] SSL shutdown: rc=1
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C21] 

We're having trouble figuring out exactly what is going on here. Could someone possibly help read these debug logs? The source envoy just seems to reset the connection. If the connection numbers should match then the source envoy is trying to use a connection that maybe doesn't exist on the destination envoy anymore? If this isn't the case I'm not sure why the source envoy is resetting the connection

Please let me know if somebody needs any more info on this problem.

--

Cheers

Chris

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
Envoy not to issue a 503 unless there is a problem with the source, destination app

Steps to reproduce the bug
This happens randomly

Version (include the output of istioctl version --remote and kubectl version)
Kubelet v1.13
Istio v.1.2.2

Environment where bug was observed (cloud vendor, OS, etc)
GKE
master 1.13.6-gke.13 (Nodes v1.12.7-gke.24)
Container-Optimized OS from Google Kernel v.4.14.127+ docker://17.3.2

@Stono
Copy link
Contributor

Stono commented Jul 3, 2019

Just been re-reading this @chrismckean, the reason you're seeing a connection reset by the source envoy is because it received a 503 from the destination envoy and http1.1 spec dictates that connection must be closed.

So that part of the picture can be disregarded, and we need to focus on the connection issue in the destination envoy to destination app.

@stale
Copy link

stale bot commented Oct 1, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 1, 2019
@vadimeisenbergibm
Copy link
Contributor

@chrismckean, @Stono does the issue still appears in 1.3?

@stale stale bot removed the stale label Oct 16, 2019
@Stono
Copy link
Contributor

Stono commented Oct 16, 2019

Yes, we still see occasional 503s but not to the level we used to, and the investigation here is likely stale.

I'll close this ticket and re-open another with more recent info if we feel we need to.

Ta

@chrismckean
Copy link
Author

Karl is correct. The errors are present still but are at a manageable rate. I'll close this issue.

@vadimeisenbergibm
Copy link
Contributor

@Stono @chrismckean great, thanks!

@Stono
Copy link
Contributor

Stono commented Oct 18, 2019

@vadimeisenbergibm we are still seeing 503 UC errors #18043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants