Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: 503 UC sometimes, upstream_reset_before_response_started #8639

Closed
oulinbao opened this issue Oct 17, 2019 · 9 comments
Closed

Bug: 503 UC sometimes, upstream_reset_before_response_started #8639

oulinbao opened this issue Oct 17, 2019 · 9 comments
Labels
question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently

Comments

@oulinbao
Copy link

oulinbao commented Oct 17, 2019

Bug Template

Title: 503 UC sometimes

Description:
I use Istio1.2.2, but sometimes I encountered some 503|UC error in access log and error log.
It seems like the issue #6578 and #2871.

Repro steps:
It's not easy to recurrent the bug, but I can see it about several times in one node.

Note: The Envoy_collect tool
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.
cluster.xds-grpc.assignment_stale: 0
cluster.xds-grpc.assignment_timeout_received: 0
cluster.xds-grpc.bind_errors: 0
cluster.xds-grpc.circuit_breakers.default.cx_open: 0
cluster.xds-grpc.circuit_breakers.default.cx_pool_open: 0
cluster.xds-grpc.circuit_breakers.default.rq_open: 0
cluster.xds-grpc.circuit_breakers.default.rq_pending_open: 0
cluster.xds-grpc.circuit_breakers.default.rq_retry_open: 0
cluster.xds-grpc.circuit_breakers.high.cx_open: 0
cluster.xds-grpc.circuit_breakers.high.cx_pool_open: 0
cluster.xds-grpc.circuit_breakers.high.rq_open: 0
cluster.xds-grpc.circuit_breakers.high.rq_pending_open: 0
cluster.xds-grpc.circuit_breakers.high.rq_retry_open: 0
cluster.xds-grpc.http2.header_overflow: 0
cluster.xds-grpc.http2.headers_cb_no_stream: 0
cluster.xds-grpc.http2.rx_messaging_error: 0
cluster.xds-grpc.http2.rx_reset: 0
cluster.xds-grpc.http2.too_many_header_frames: 0
cluster.xds-grpc.http2.trailers: 0
cluster.xds-grpc.http2.tx_reset: 0
cluster.xds-grpc.internal.upstream_rq_200: 1
cluster.xds-grpc.internal.upstream_rq_2xx: 1
cluster.xds-grpc.internal.upstream_rq_503: 1
cluster.xds-grpc.internal.upstream_rq_5xx: 1
cluster.xds-grpc.internal.upstream_rq_completed: 2
cluster.xds-grpc.lb_healthy_panic: 1
cluster.xds-grpc.lb_local_cluster_not_ok: 0
cluster.xds-grpc.lb_recalculate_zone_structures: 0
cluster.xds-grpc.lb_subsets_active: 0
cluster.xds-grpc.lb_subsets_created: 0
cluster.xds-grpc.lb_subsets_fallback: 0
cluster.xds-grpc.lb_subsets_fallback_panic: 0
cluster.xds-grpc.lb_subsets_removed: 0
cluster.xds-grpc.lb_subsets_selected: 0
cluster.xds-grpc.lb_zone_cluster_too_small: 0
cluster.xds-grpc.lb_zone_no_capacity_left: 0
cluster.xds-grpc.lb_zone_number_differs: 0
cluster.xds-grpc.lb_zone_routing_all_directly: 0
cluster.xds-grpc.lb_zone_routing_cross_zone: 0
cluster.xds-grpc.lb_zone_routing_sampled: 0
cluster.xds-grpc.max_host_weight: 1
cluster.xds-grpc.membership_change: 1
cluster.xds-grpc.membership_degraded: 0
cluster.xds-grpc.membership_excluded: 0
cluster.xds-grpc.membership_healthy: 1
cluster.xds-grpc.membership_total: 1
cluster.xds-grpc.original_dst_host_invalid: 0
cluster.xds-grpc.retry_or_shadow_abandoned: 0
cluster.xds-grpc.update_attempt: 297
cluster.xds-grpc.update_empty: 0
cluster.xds-grpc.update_failure: 0
cluster.xds-grpc.update_no_rebuild: 296
cluster.xds-grpc.update_success: 297
cluster.xds-grpc.upstream_cx_active: 1
cluster.xds-grpc.upstream_cx_close_notify: 0
cluster.xds-grpc.upstream_cx_connect_attempts_exceeded: 0
cluster.xds-grpc.upstream_cx_connect_fail: 0
cluster.xds-grpc.upstream_cx_connect_timeout: 0
cluster.xds-grpc.upstream_cx_destroy: 0
cluster.xds-grpc.upstream_cx_destroy_local: 0
cluster.xds-grpc.upstream_cx_destroy_local_with_active_rq: 0
cluster.xds-grpc.upstream_cx_destroy_remote: 0
cluster.xds-grpc.upstream_cx_destroy_remote_with_active_rq: 0
cluster.xds-grpc.upstream_cx_destroy_with_active_rq: 0
cluster.xds-grpc.upstream_cx_http1_total: 0
cluster.xds-grpc.upstream_cx_http2_total: 1
cluster.xds-grpc.upstream_cx_idle_timeout: 0
cluster.xds-grpc.upstream_cx_max_requests: 0
cluster.xds-grpc.upstream_cx_none_healthy: 1
cluster.xds-grpc.upstream_cx_overflow: 0
cluster.xds-grpc.upstream_cx_pool_overflow: 0
cluster.xds-grpc.upstream_cx_protocol_error: 0
cluster.xds-grpc.upstream_cx_rx_bytes_buffered: 30
cluster.xds-grpc.upstream_cx_rx_bytes_total: 602003489
cluster.xds-grpc.upstream_cx_total: 1
cluster.xds-grpc.upstream_cx_tx_bytes_buffered: 0
cluster.xds-grpc.upstream_cx_tx_bytes_total: 48184613
cluster.xds-grpc.upstream_flow_control_backed_up_total: 0
cluster.xds-grpc.upstream_flow_control_drained_total: 0
cluster.xds-grpc.upstream_flow_control_paused_reading_total: 0
cluster.xds-grpc.upstream_flow_control_resumed_reading_total: 0
cluster.xds-grpc.upstream_internal_redirect_failed_total: 0
cluster.xds-grpc.upstream_internal_redirect_succeeded_total: 0
cluster.xds-grpc.upstream_rq_200: 1
cluster.xds-grpc.upstream_rq_2xx: 1
cluster.xds-grpc.upstream_rq_503: 1
cluster.xds-grpc.upstream_rq_5xx: 1
cluster.xds-grpc.upstream_rq_active: 1
cluster.xds-grpc.upstream_rq_cancelled: 0
cluster.xds-grpc.upstream_rq_completed: 2
cluster.xds-grpc.upstream_rq_maintenance_mode: 0
cluster.xds-grpc.upstream_rq_pending_active: 0
cluster.xds-grpc.upstream_rq_pending_failure_eject: 0
cluster.xds-grpc.upstream_rq_pending_overflow: 0
cluster.xds-grpc.upstream_rq_pending_total: 1
cluster.xds-grpc.upstream_rq_per_try_timeout: 0
cluster.xds-grpc.upstream_rq_retry: 0
cluster.xds-grpc.upstream_rq_retry_overflow: 0
cluster.xds-grpc.upstream_rq_retry_success: 0
cluster.xds-grpc.upstream_rq_rx_reset: 0
cluster.xds-grpc.upstream_rq_timeout: 0
cluster.xds-grpc.upstream_rq_total: 1
cluster.xds-grpc.upstream_rq_tx_reset: 0
cluster.xds-grpc.version: 0
cluster_manager.active_clusters: 209
cluster_manager.cds.update_attempt: 4623
cluster_manager.cds.update_failure: 0
cluster_manager.cds.update_rejected: 0
cluster_manager.cds.update_success: 4622
cluster_manager.cds.version: 1768992366685875704
cluster_manager.cluster_added: 218
cluster_manager.cluster_modified: 260
cluster_manager.cluster_removed: 9
cluster_manager.cluster_updated: 42
cluster_manager.cluster_updated_via_merge: 0
cluster_manager.update_merge_cancelled: 0
cluster_manager.update_out_of_merge_window: 18
cluster_manager.warming_clusters: 0
http_mixer_filter.total_check_cache_hit_accepts: 9978
http_mixer_filter.total_check_cache_hit_denies: 0
http_mixer_filter.total_check_cache_hits: 9978
http_mixer_filter.total_check_cache_misses: 3181
http_mixer_filter.total_check_calls: 13159
http_mixer_filter.total_quota_cache_hit_accepts: 0
http_mixer_filter.total_quota_cache_hit_denies: 0
http_mixer_filter.total_quota_cache_hits: 0
http_mixer_filter.total_quota_cache_misses: 0
http_mixer_filter.total_quota_calls: 0
http_mixer_filter.total_remote_call_cancellations: 0
http_mixer_filter.total_remote_call_other_errors: 0
http_mixer_filter.total_remote_call_retries: 0
http_mixer_filter.total_remote_call_send_errors: 0
http_mixer_filter.total_remote_call_successes: 3181
http_mixer_filter.total_remote_call_timeouts: 0
http_mixer_filter.total_remote_calls: 3181
http_mixer_filter.total_remote_check_accepts: 3181
http_mixer_filter.total_remote_check_calls: 3181
http_mixer_filter.total_remote_check_denies: 0
http_mixer_filter.total_remote_quota_accepts: 0
http_mixer_filter.total_remote_quota_calls: 0
http_mixer_filter.total_remote_quota_denies: 0
http_mixer_filter.total_remote_quota_prefetch_calls: 0
http_mixer_filter.total_remote_report_calls: 7846
http_mixer_filter.total_remote_report_other_errors: 7846
http_mixer_filter.total_remote_report_send_errors: 0
http_mixer_filter.total_remote_report_successes: 0
http_mixer_filter.total_remote_report_timeouts: 0
http_mixer_filter.total_report_calls: 13159
listener_manager.lds.update_attempt: 4623
listener_manager.lds.update_failure: 0
listener_manager.lds.update_rejected: 0
listener_manager.lds.update_success: 4622
listener_manager.lds.version: 1768992366685875704
listener_manager.listener_added: 2
listener_manager.listener_create_failure: 0
listener_manager.listener_create_success: 64
listener_manager.listener_modified: 0
listener_manager.listener_removed: 0
listener_manager.total_listeners_active: 2
listener_manager.total_listeners_draining: 0
listener_manager.total_listeners_warming: 0
server.concurrency: 32
server.days_until_first_cert_expiring: 2147483647
server.debug_assertion_failures: 0
server.hot_restart_epoch: 0
server.live: 1
server.memory_allocated: 31820944
server.memory_heap_size: 60817408
server.parent_connections: 0
server.total_connections: 0
server.uptime: 88939
server.version: 11105622
server.watchdog_mega_miss: 0
server.watchdog_miss: 15
cluster.xds-grpc.upstream_cx_connect_ms: P0(nan,0) P25(nan,0) P50(nan,0) P75(nan,0) P90(nan,0) P95(nan,0) P99(nan,0) P99.5(nan,0) P99.9(nan,0) P100(nan,0)
cluster.xds-grpc.upstream_cx_length_ms: No recorded values

{
"version": "a975561b980463f08689d3debe33bb9eefc80c3d/1.11.0-dev/Clean/RELEASE/BoringSSL",
"state": "LIVE",
"command_line_options": {
"base_id": "0",
"concurrency": 32,
"config_path": "/var/lib/istio/envoy-rev0.json",
"config_yaml": "",
"allow_unknown_fields": true,
"admin_address_path": "",
"local_address_ip_version": "v4",
"log_level": "debug",
"component_log_level": "misc:error",
"log_format": "[%Y-%m-%d %T.%e][%t][%l][%n] %v",
"log_path": "/export/Logs/envoy.log",
"hot_restart_version": false,
"service_cluster": "xxxx-ingress",
"service_node": "router10.226.201.2181.xxxx-ingress.default~default.mesh.xxxx.com",
"service_zone": "",
"mode": "Serve",
"max_stats": "0",
"max_obj_name_len": "0",
"disable_hot_restart": false,
"enable_mutex_tracing": false,
"restart_epoch": 0,
"cpuset_threads": false,
"file_flush_interval": "10s",
"drain_time": "45s",
"parent_shutdown_time": "60s"
},
"uptime_current_epoch": "89062s",
"uptime_all_epochs": "89062s"
}

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Config:

Include the config used to configure Envoy.

Logs:

Include the access logs and the Envoy logs.
3236741-[2019-10-14 10:56:06.963][326][debug][filter] [src/envoy/http/mixer/filter.cc:162] Called Mixer::Filter : check complete OK
3236742-[2019-10-14 10:56:06.963][326][debug][router] [external/envoy/source/common/router/router.cc:332] [C523429][S16681021969230338662] cluster 'outbound|3001||apigw-back-console.default.mesh.xxxx.com' match for URL '/public/js/app.06abacd6.js'
3236743-[2019-10-14 10:56:06.963][285][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C523432] response complete
3236744-[2019-10-14 10:56:06.963][285][debug][pool] [external/envoy/source/common/http/http2/conn_pool.cc:237] [C523432] destroying stream: 0 remaining
3236745-[2019-10-14 10:56:06.963][285][debug][http] [external/envoy/source/common/http/async_client_impl.cc:109] async http request response trailers:
3236746-'grpc-status', '0'
3236747-'grpc-message', ''
3236748-
3236749-[2019-10-14 10:56:06.963][326][debug][router] [external/envoy/source/common/router/router.cc:393] [C523429][S16681021969230338662] router decoding headers:
3236750-':authority', 'apigw-back-stag.xxx.com'
3236751-':path', '/public/js/app.06abacd6.js'
3236752-':method', 'GET'
3236753-':scheme', 'http'
3236754-'if-none-match', '"K6/i3mgI38hs45GHBu2GzQ=="'
3236755-'if-modified-since', 'Thu, 15 Aug 2019 02:44:20 GMT'
3236756-'user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'
3236757-'accept', '/'
3236758-'referer', 'http://apigw-back-stag.xxx.com/openapiManage'
3236759-'accept-encoding', 'gzip, deflate'
3236760-'accept-language', 'zh-CN,zh;q=0.9,en;q=0.8'
3236761-'cookie', '3AB9D23F7A4B3C9B=JZNMAUKUIDSTSLBAWJT2XPU6PU2LPFGDI3GP7LNYDMABU7IR64OOO7GQUGAGHPXCTWYJB6C5KKTPRXCW37BNL243YY; sso.xxx.com=BJ.de2e36258cf042d79685a735e760b438'
3236762-'j-forwarded-for', '10.226.143.198'
3236763-'x-forwarded-for', '10.226.143.198,10.226.192.40'
3236764-'x-original-to', '10.226.202.221'
3236765-'x-forwarded-proto', 'http'
3236766-'x-envoy-external-address', '10.226.192.40'
3236767-'x-request-id', '85701398-a0e5-9b88-806b-af0956cff0af'
3236768-'x-envoy-decorator-operation', 'apigw-back-console.default.mesh.xxxx.com:3001/'
3236769-'x-istio-attributes', 'CkkKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBItEithcGlndy1iYWNrLWNvbnNvbGUuZGVmYXVsdC5tZXNoLmpkY2xvdWQuY29tCi4KCnNvdXJjZS51aWQSIBIeYXJrOi8vMS5qZG1lc2gtaW5ncmVzcy5kZWZhdWx0'
3236770-'x-b3-traceid', '60d23defd3d6b2aaec15210e95fd99e6'
3236771-'x-b3-spanid', 'ec15210e95fd99e6'
3236772-'x-b3-sampled', '1'
3236773-
3236774-[2019-10-14 10:56:06.963][326][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C304262] using existing connection
3236775-[2019-10-14 10:56:06.963][326][debug][router] [external/envoy/source/common/router/router.cc:1210] [C523429][S16681021969230338662] pool ready
3236776-[2019-10-14 10:56:06.963][285][debug][grpc] [src/envoy/utils/grpc_transport.cc:67] Check response: precondition {
3236777- status {
3236778- }
3236779- valid_duration {
3236780- seconds: 60
3236781- }
3236782- valid_use_count: 10000
3236783- referenced_attributes {
3236784- attribute_matches {
3236785- name: 4
3236786- condition: EXACT
3236787- }
3236788- attribute_matches {
3236789- name: 201
3236790- condition: EXACT
3236791- }
3236792- }
3236793-}
3236794-
3236795-[2019-10-14 10:56:06.963][326][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:567] [C523431] stream closed: 0
3236796-[2019-10-14 10:56:06.963][285][debug][filter] [src/istio/mixerclient/client_impl.cc:263] CheckResult transport=OK, policy=OK, quota=NA, attempt=0
3236797-[2019-10-14 10:56:06.963][285][debug][filter] [src/envoy/http/mixer/filter.cc:162] Called Mixer::Filter : check complete OK
3236798-[2019-10-14 10:56:06.963][285][debug][router] [external/envoy/source/common/router/router.cc:332] [C523430][S7230925655636222372] cluster 'outbound|3001||apigw-back-console.default.mesh.xxx.com' match for URL '/public/js/chunk-vendors.580fee1e.js'
3236799-[2019-10-14 10:56:06.963][285][debug][router] [external/envoy/source/common/router/router.cc:393] [C523430][S7230925655636222372] router decoding headers:
3236800-':authority', 'apigw-back-stag.xxx.com'
3236801-':path', '/public/js/chunk-vendors.580fee1e.js'
3236802-':method', 'GET'
3236803-':scheme', 'http'
3236804-'if-none-match', '"nm0Ct3Dlm/D6J/43PmPlJQ=="'
3236805-'if-modified-since', 'Thu, 15 Aug 2019 02:44:20 GMT'
3236806-'user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'
3236807-'accept', '
/'
3236808-'referer', 'http://apigw-back-stag.xxx.com/openapiManage'
3236809-'accept-encoding', 'gzip, deflate'
3236810-'accept-language', 'zh-CN,zh;q=0.9,en;q=0.8'
3236811-'cookie', '3AB9D23F7A4B3C9B=JZNMAUKUIDSTSLBAWJT2XPU6PU2LPFGDI3GP7LNYDMABU7IR64OOO7GQUGAGHPXCTWYJB6C5KKTPRXCW37BNL243YY; sso.xxx.com=BJ.de2e36258cf042d79685a735e760b438'
3236812-'j-forwarded-for', '10.226.143.198'
3236813-'x-forwarded-for', '10.226.143.198,10.226.192.40'
3236814-'x-original-to', '10.226.202.221'
3236815-'x-forwarded-proto', 'http'
3236816-'x-envoy-external-address', '10.226.192.40'
3236817:'x-request-id', '5334c21e-2274-9817-9911-eeafb9ef2f25'
3236818-'x-envoy-decorator-operation', 'apigw-back-console.default.mesh.xxx.com:3001/
'
3236819-'x-istio-attributes', 'Ci4KCnNvdXJjZS51aWQSIBIeYXJrOi8vMS5qZG1lc2gtaW5ncmVzcy5kZWZhdWx0CkkKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBItEithcGlndy1iYWNrLWNvbnNvbGUuZGVmYXVsdC5tZXNoLmpkY2xvdWQuY29t'
3236820-'x-b3-traceid', 'e0986267384ad45f16a889e4e219dc65'
3236821-'x-b3-spanid', '16a889e4e219dc65'
3236822-'x-b3-sampled', '1'
3236823-
3236824-[2019-10-14 10:56:06.963][285][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C304210] using existing connection
3236825-[2019-10-14 10:56:06.963][285][debug][router] [external/envoy/source/common/router/router.cc:1210] [C523430][S7230925655636222372] pool ready
3236826-[2019-10-14 10:56:06.963][285][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:567] [C523432] stream closed: 0
3236827-[2019-10-14 10:56:06.963][285][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C304210] remote close
3236828-[2019-10-14 10:56:06.963][285][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C304210] closing socket: 0
3236829-[2019-10-14 10:56:06.963][285][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C304210] disconnect. resetting 1 pending requests
3236830-[2019-10-14 10:56:06.963][285][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C304210] request reset
3236831-[2019-10-14 10:56:06.963][285][debug][router] [external/envoy/source/common/router/router.cc:671] [C523430][S7230925655636222372] upstream reset: reset reason connection termination
3236832-[2019-10-14 10:56:06.963][285][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C523430][S7230925655636222372] Sending local reply with details upstream_reset_before_response_started{connection termination}
3236833-[2019-10-14 10:56:06.963][285][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
3236834-[2019-10-14 10:56:06.963][285][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1267] [C523430][S7230925655636222372] closing connection due to connection close header
3236835-[2019-10-14 10:56:06.963][285][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1329] [C523430][S7230925655636222372] encoding headers via codec (end_stream=false):
3236836-':status', '503'
3236837-'content-length', '95'
3236838-'content-type', 'text/plain'
3236839-'date', 'Mon, 14 Oct 2019 02:56:06 GMT'
3236840-'server', 'istio-envoy'
3236841-'connection', 'close'
3236842-
3236843-[2019-10-14 10:56:06.963][285][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
3236844-[2019-10-14 10:56:06.963][285][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C523430] closing data_to_write=254 type=2
3236845-[2019-10-14 10:56:06.963][285][debug][connection] [external/envoy/source/common/network/connection_impl.cc:651] [C523430] setting delayed close timer with timeout 1000 ms
3236846-[2019-10-14 10:56:06.963][285][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C304210] client disconnected, failure reason:
3236847-[2019-10-14 10:56:06.963][285][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
3236848-[2019-10-14 10:56:06.963][285][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Call Stack:

If the Envoy binary is crashing, a call stack is required.
Please refer to the Bazel Stack trace documentation.

@zuercher zuercher added the question Questions that are neither investigations, bugs, nor enhancements label Oct 17, 2019
@prune998
Copy link

I don't know if this issue could be related, but I have something similar, where envoy close the TCP connexion (TCP FIN) without reasons.
As It's Envoy deployed by Istio 1.3.3, I posted the issue there, but as I extect it to be an Envoy issue, I'm linking it here : istio/istio#18043

I have TCP dumps showing the issue where Envoy close the TCP connexion, which is re-opened when I restart the client application.
While the TCP connexion should almost never be "terminated" by the Envoy sidecar in the middle of the transmission, the regular behaviour would be the HTTP2 stream to be "closed", not the TCP cnx.
Sorry if this is not related. In this cas I'll open another issue.
Any debug idea is welcome. I'm trying to add more debug on some parts of Istio but wasn't able to get any insight...

@enbohm
Copy link

enbohm commented Nov 7, 2019

I faced similar issues using Envoy as a proxy in front of Quarkus applications (see quarkusio/quarkus#4572). This started when Quarkus changed their HTTP layer to Vert.x. It seems to have something to do with socket timeout and keep-alive (connection pooling) between Envoy and Vert.x that does not match. I managed to solve this by setting max_requests_per_connection to 1 (=disabled keep alive see https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto.html?highlight=keepalive) Maybe something you can try?

@oulinbao
Copy link
Author

oulinbao commented Nov 9, 2019

@enbohm Thanks for your information. I also think it is a connection reuse problem. I will try your method. I am also trying to set idleTimeout to a small value, hope to solve this problem.

@prune998
Copy link

prune998 commented Nov 9, 2019

Upgrade to Istio 1.3.4 where some bugs were corrected in istio's pilot.

@stale
Copy link

stale bot commented Dec 29, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 29, 2019
@stale
Copy link

stale bot commented Jan 5, 2020

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@amalaruja
Copy link

I faced similar issues using Envoy as a proxy in front of Quarkus applications (see quarkusio/quarkus#4572). This started when Quarkus changed their HTTP layer to Vert.x. It seems to have something to do with socket timeout and keep-alive (connection pooling) between Envoy and Vert.x that does not match. I managed to solve this by setting max_requests_per_connection to 1 (=disabled keep alive see https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto.html?highlight=keepalive) Maybe something you can try?

This worked for me. It seems that the server my envoy was connecting to had HTTP Keep-Alive disabled.

@amalaruja
Copy link

Update: I reverted disabling HTTP Keep-Alive. Instead I used upstream_connection_options with

tcp_keepalive: 
  keepalive_time: 1
  keepalive_interval: 10

Now I am still not getting any 503 UC.
tcp_keepalive
This seems to be the better solution since I don't have to give up on the benefits of HTTP Keep-Alive.

@dhanisahu
Copy link

I faced similar issues using Envoy as a proxy in front of Quarkus applications (see quarkusio/quarkus#4572). This started when Quarkus changed their HTTP layer to Vert.x. It seems to have something to do with socket timeout and keep-alive (connection pooling) between Envoy and Vert.x that does not match. I managed to solve this by setting max_requests_per_connection to 1 (=disabled keep alive see https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto.html?highlight=keepalive) Maybe something you can try?

This worked for me. It seems that the server my envoy was connecting to had HTTP Keep-Alive disabled.

hi @amalaruja could you please provide the details where we have to made changes with sample code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

6 participants