Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect sidecar proxy upstream config connect_timeout_ms does not work #11603

Open
tmiroslav opened this issue Nov 18, 2021 · 7 comments
Open
Labels
needs-investigation The issue described is detailed and complex. theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/bug Feature does not function as expected

Comments

@tmiroslav
Copy link

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

It ls failing to setup connection timeout in upstream config per docs here

A paragraph or two about the issue you're experiencing:
I am facing timeouts every time upstream service that my service is connecting to is not responding in 5s. This is too low for my use case and I want to extend this timeout to 10s. But when setup proper connect_timeout_ms in upstream config, I realize that my service still timeouts after 5s. So, my config change in upstream proxy config is not applied.

Reproduction Steps

Steps to reproduce this issue, eg:

This is my service definition file:

 service {
  name = "cpanel"
  port = 4000
  connect {
    sidecar_service {
      proxy {
        upstreams = [
          {
            destination_name = "gateway-internal-api"
            local_bind_port  = 10000
            config {
              connect_timeout_ms = 10000
              }
          },
        ]
      }
    }
  }

Consul info for both Client and Server

Consul v1.8.3 Envoy 1.14.2

Client info:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 2
build:
	prerelease = 
	revision = a9322b9c
	version = 1.8.3
consul:
	acl = enabled
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 102
	max_procs = 8
	os = linux
	version = go1.14.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 57
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 16982
	members = 53
	query_queue = 0
	query_time = 1

Server info:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = a9322b9c
	version = 1.8.3
consul:
	acl = enabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 10.228.45.65:8300
	server = true
raft:
	applied_index = 21010459
	commit_index = 21010459
	fsm_pending = 0
	last_contact = 4.249427ms
	last_log_index = 21010459
	last_log_term = 1728
	last_snapshot_index = 20997099
	last_snapshot_term = 1728
	latest_configuration = [{Suffrage:Voter ID:1aceb41a-9309-720d-0548-703bf300a940 Address:10.228.44.218:8300} {Suffrage:Voter ID:918add50-a22b-a418-82ae-2d8d2fe5465e Address:10.228.45.65:8300} {Suffrage:Voter ID:89847217-65f3-ed14-f1a8-244c44996eb8 Address:10.228.46.88:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 1728
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 479
	max_procs = 4
	os = linux
	version = go1.14.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 57
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 16982
	members = 53
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 135
	members = 3
	query_queue = 0
	query_time = 1

Operating system and Environment details

more /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

@Amier3 Amier3 added type/bug Feature does not function as expected theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies needs-investigation The issue described is detailed and complex. labels Nov 18, 2021
@Amier3
Copy link
Contributor

Amier3 commented Nov 23, 2021

Hey @tmiroslav ,

I believe for the timeouts to work it has to be set on the proxy level (local_connect_timeout_ms in gateway-internal-api service definition) and in the upstream config ( what you have set above ) -- I suspect that may be the issue. If you already set that value in the gateway-internal-api service definition and it's still not working, could you provide that file so we can look into this further?

@Amier3 Amier3 added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 23, 2021
@github-actions github-actions bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 23, 2021
@tmiroslav
Copy link
Author

Hi @Amier3

Thank you!
I am going to test this and will come to you as soon as I get results.

BR,
Miroslav

@tmiroslav
Copy link
Author

Hi @Amier3

It's no better after adding local_connect_timeout_ms in gateway-internal-api service definition.
I am running Consul in VMs. I have service cpanel talk to service gateway-internal-api. I already pasted above cpanel service definition. This is gateway-internal-api ,after I added proxy config parameter you suggested:

service {
	name = "gateway-internal-api"
	port = 8787
	connect {
	  sidecar_service {
		proxy {
		  config {
			 local_connect_timeout_ms = 10000
		  }
		  upstreams = [
			{
			  destination_name = "exhibitor"
			  local_bind_port  = 10025
			},
			{
			  destination_name = "redis"
			  local_bind_port  = 10060
			},
		  ]
		}
	  }
	}
 

Still, I am getting in the logs something like:
user_id=92a70b0f-a3da-43a9-a176-f95b475117 client_ip=77.46.205.166 operation_name=getPoolStatus [error] GET http://localhost:10000/api/v1/servers/ -> error: :timeout (5001.237 ms)

Where I can see that it still timeouts after 5s!?
Is this a bug, or I am still missing to set configs properly?

Thank you!

@tmiroslav tmiroslav reopened this Nov 25, 2021
@tmiroslav
Copy link
Author

Hi @Amier3

Any advice what I should do next? Should I go for Consul/Envoy upgrade maybe for this to make working?
We are facing this issue in production, and it's really urgent to do something to overcome 5s timeouts!

Thanx!
Miroslav

@Amier3
Copy link
Contributor

Amier3 commented Jan 3, 2022

Hey @tmiroslav

Apologies for the delayed response! After looking into this issue a bit more, I realized that i'd need to pull in some of the engineering team to help figure out how to fix this and if an upgrade was required. Due to the holidays that were quickly approaching, it was hard to find the bandwidth to dig deep into this in December.

Are you still experiencing this issue and did you end up upgrading to try to fix it?

@Amier3
Copy link
Contributor

Amier3 commented Jan 3, 2022

@tmiroslav

Also, it'd help a lot if you can provide us with an envoy config dump using curl localhost:19000/config_dump

@chrisboulton
Copy link
Contributor

@tmiroslav maybe have a look at my comment here: #6382 (comment) - I suspect the section on Upstream Request Timeouts is what you're running into, as it's not something that has been addressed yet. I have a couple of proposed options on that PR (including one you can use today with a service-router, and another which disables the upstream timeouts entirely, which we do with a custom build of Consul presently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-investigation The issue described is detailed and complex. theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants