Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul connect provide a way to configure envoy route timeout #6382

Open
skyrocknroll opened this issue Aug 23, 2019 · 14 comments
Open

Consul connect provide a way to configure envoy route timeout #6382

skyrocknroll opened this issue Aug 23, 2019 · 14 comments
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/envoy/xds Related to Envoy support type/enhancement Proposed improvement or new feature

Comments

@skyrocknroll
Copy link

skyrocknroll commented Aug 23, 2019

Feature Description

Provide a way to configure upstream listener to configure timeout.

Use Case(s)

https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#route-routeaction
The envoy default value is 15s second which is too high or too low depends on the use case.

@hanshasselberg hanshasselberg added the theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies label Aug 23, 2019
@stale
Copy link

stale bot commented Oct 22, 2019

Hey there,
We wanted to check in on this request since it has been inactive for at least 60 days.
If you think this is still an important issue in the latest version of Consul
or its documentation please reply with a comment here which will cause it to stay open for investigation.
If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well!
Thank you!

@stale stale bot added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Oct 22, 2019
@rboyer rboyer added the type/enhancement Proposed improvement or new feature label Oct 22, 2019
@stale stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Oct 22, 2019
@msuarezd
Copy link

msuarezd commented Nov 13, 2019

Hi,

we set this timeout using a service router:

Kind = "service-router"
Name = "myservice"
Routes = [
  {
    Destination {
      Service = ""
      RetryOnConnectFailure = true
      RequestTimeout = "120s"
    }
  }
]

which can then be seeing in the envoy configuration:

           "cluster": "myservice.default.dc1.internal.1af17f86-a6e7-28b2-3176-5a22dfc90678.consul",
           "timeout": "120s",
           "retry_policy": {
                "retry_on": "connect-failure" 
            }
}

However, envoy sets this header:
x-envoy-expected-rq-timeout-ms: 15000
(see https://www.envoyproxy.io/docs/envoy/v1.11.1/configuration/http_filters/router_filter.html?highlight=timeout#x-envoy-expected-rq-timeout-ms)
which seems to cause the upstream envoy to give up after 15 seconds.

We couldn't find a way to increase the value of this header ( or disable it ) via consul which would leave us with the configured timeout in the service router.
As a result we sporadically get a 504 Gateway Timeout for some long running requests.

At this point we are not sure if there is way to manage that header and we missed it or if there is still no way to manage it, in which case it would be a very helpful feature.

We are using consul 1.6.1 with envoy 1.11.1

If you could shed some light on this that would help us a lot
Thanks!

@rrondeau
Copy link
Contributor

rrondeau commented Feb 4, 2020

I have the same issue and i think this is because envoy proxy on the local app dont set a specific timeout.
Your envoy proxy in front of the target app set the default timeout for all requests it sends to his local app.
To avoid this problem, the only solution i found was to override the public listener with envoy_public_listener_json config key

@msuarezd
Copy link

msuarezd commented Feb 4, 2020

Hi @rrondeau ,
many thanks for the hint. Would it be possible to get an example snippet of the configuration you used?
Thanks!

@dmai-apixio
Copy link

i faced the same issue. I end up to build consul from source code with my fix. hope you found it useful

diff --git a/agent/xds/listeners.go b/agent/xds/listeners.go
index b44cd996c..67805ec29 100644
--- a/agent/xds/listeners.go
+++ b/agent/xds/listeners.go
@@ -9,6 +9,7 @@ import (
        "regexp"
        "strconv"
        "strings"
+       "time"

        envoy "github.com/envoyproxy/go-control-plane/envoy/api/v2"
        envoyauth "github.com/envoyproxy/go-control-plane/envoy/api/v2/auth"
@@ -727,6 +728,7 @@ func makeHTTPFilter(
                                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                                Cluster: cluster,
                                        },
+                                       Timeout: addrTime(5*time.Minute),
                                },
                        },
                }
@@ -844,3 +846,7 @@ func makeCommonTLSContext(cfgSnap *proxycfg.ConfigSnapshot) *envoyauth.CommonTls
                },
        }
 }
+
+func addrTime(t time.Duration) *time.Duration {
+       return &t
+}
diff --git a/agent/xds/routes.go b/agent/xds/routes.go
index 520545bdc..9cadaa2b1 100644
--- a/agent/xds/routes.go
+++ b/agent/xds/routes.go
@@ -4,6 +4,7 @@ import (
        "errors"
        "fmt"
        "strings"
+       "time"

        "github.com/gogo/protobuf/proto"

@@ -317,6 +318,7 @@ func makeRouteActionForSingleCluster(targetID string, chain *structs.CompiledDis
                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                Cluster: clusterName,
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }
 }
@@ -353,6 +355,7 @@ func makeRouteActionForSplitter(splits []*structs.DiscoverySplit, chain *structs
                                        TotalWeight: makeUint32Value(10000), // scaled up 100%
                                },
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }, nil
 }

@blake
Copy link
Member

blake commented Feb 18, 2020

@msuarezd, I believe the issue you're describing is related to envoyproxy/envoy#7358 which was fixed in Envoy version 1.12.0 with envoyproxy/envoy#8051. If that's correct, the correct fix would be for Consul to also allow configuring respect_expected_rq_timeout in the envoy.router HTTP filter config on the destination sidecars.

@dnephin dnephin added the theme/envoy/xds Related to Envoy support label Apr 1, 2020
@hamishforbes
Copy link
Contributor

Has there been any progress on this issue?
I'm extremely surprised to find there's no way to easily change request timeouts when using Consul Connect, this seems pretty critical?

Is setting RequestTimeout in a Service Resolver configuration the expected way to configure this? And it just doesn't work because of the respect_expected_rq_timeout issue mentioned above?

I'm not super familiar with Envoy itself and the documentation around the Envoy escape hatch configuration is hard to follow.
Are these options the expected way to configure a request timeout?
If so are there any examples of how to do this? Particularly when using the consul-k8s injector.

@hamishforbes
Copy link
Contributor

hamishforbes commented Jun 18, 2020

Wew, this has been a bit of a rabbit hole but for anyone else who runs into this issue...

The quickest fix is to change the destination service protocol to tcp, this obviously means you can't use any of the Connect L7 features but at least your application level timeouts will work.
edit: this does seem to have changed all the upstream listeners to TCP proxies even though only 1 of the services is TCP and the rest are HTTP, a different bug maybe?

Setting a service-router configuration with a RequestTimeout should work.
It injects a timeout configuration into the envoy route configuration as @msuarezd pointed out.

This had no effect for me, couldn't find a timeout parameter in the config dump on either envoy instance.
Reasonably sure this is me doing something wrong but I didn't dig too hard given the respect_expected_rq_timeout issue means this doesn't work properly.

I had a look at enabling respect_expected_rq_timeout in the envoy.router config but that's not an easy fix either.
The go-control-plane dependency doesn't support that configuration until v0.9.1 but v0.9.0 is a major breaking change and looks like it would need quite a few changes in the Consul codebase to support, or some hacking around to force that parameter.

@LeComptoirDesPharmacies
Copy link

Hi,
Same problem here, 15 seconds is too short for some of our workload.
Moreover, we are using Connect L7 feature so we can not set endpoint to tcp.
I am really interested by @rrondeau solution. Did you have any example ?

I suppose that you override the 'listener_filters_timeout' value ?
Is it possible to let all the other keys unset ?

Yours faithfully,
LCDP

@radykal-com
Copy link

Today I struggled into this issue and these are my conclusions at this moment with Consul 1.9.1 and Envoy 1.16.0:

  • The RequestTimeout at service-router config is not enough (is required so other complimentary workarounds works).

  • Then I tried the change suggested in

@msuarezd, I believe the issue you're describing is related to envoyproxy/envoy#7358 which was fixed in Envoy version 1.12.0 with envoyproxy/envoy#8051. If that's correct, the correct fix would be for Consul to also allow configuring respect_expected_rq_timeout in the envoy.router HTTP filter config on the destination sidecars.

So I recompiled Consul with:

HttpFilters: []*envoyhttp.HttpFilter{
	{
		Name: "envoy.router",
		ConfigType: &envoyhttp.HttpFilter_Config{Config: &pbstruct.Struct{
			Fields: map[string]*pbstruct.Value{
				"respect_expected_rq_timeout": {
					Kind: &pbstruct.Value_BoolValue{
						BoolValue: true,
					},
				},
			},
		}},
	},
},

so all envoy.router entries now have the respect_expected_rq_timeoutset to true(they appear when dumping envoy config) but it stills timing out at 15s even when the service-router has the timeout at 120s

  • Finally the only working solution is the one proposed in

i faced the same issue. I end up to build consul from source code with my fix. hope you found it useful

diff --git a/agent/xds/listeners.go b/agent/xds/listeners.go
index b44cd996c..67805ec29 100644
--- a/agent/xds/listeners.go
+++ b/agent/xds/listeners.go
@@ -9,6 +9,7 @@ import (
        "regexp"
        "strconv"
        "strings"
+       "time"

        envoy "github.com/envoyproxy/go-control-plane/envoy/api/v2"
        envoyauth "github.com/envoyproxy/go-control-plane/envoy/api/v2/auth"
@@ -727,6 +728,7 @@ func makeHTTPFilter(
                                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                                Cluster: cluster,
                                        },
+                                       Timeout: addrTime(5*time.Minute),
                                },
                        },
                }
@@ -844,3 +846,7 @@ func makeCommonTLSContext(cfgSnap *proxycfg.ConfigSnapshot) *envoyauth.CommonTls
                },
        }
 }
+
+func addrTime(t time.Duration) *time.Duration {
+       return &t
+}
diff --git a/agent/xds/routes.go b/agent/xds/routes.go
index 520545bdc..9cadaa2b1 100644
--- a/agent/xds/routes.go
+++ b/agent/xds/routes.go
@@ -4,6 +4,7 @@ import (
        "errors"
        "fmt"
        "strings"
+       "time"

        "github.com/gogo/protobuf/proto"

@@ -317,6 +318,7 @@ func makeRouteActionForSingleCluster(targetID string, chain *structs.CompiledDis
                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                Cluster: clusterName,
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }
 }
@@ -353,6 +355,7 @@ func makeRouteActionForSplitter(splits []*structs.DiscoverySplit, chain *structs
                                        TotalWeight: makeUint32Value(10000), // scaled up 100%
                                },
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }, nil
 }

but changed to work with latest consul changes:

diff --git a/agent/xds/listeners.go b/agent/xds/listeners.go
index dcf8f76d2..c7986e564 100644
--- a/agent/xds/listeners.go
+++ b/agent/xds/listeners.go
@@ -24,6 +24,7 @@ import (
        "github.com/golang/protobuf/proto"
        pbtypes "github.com/golang/protobuf/ptypes"
        "github.com/golang/protobuf/ptypes/any"
+       "github.com/golang/protobuf/ptypes/duration"
        pbstruct "github.com/golang/protobuf/ptypes/struct"
        "github.com/golang/protobuf/ptypes/wrappers"
        "github.com/hashicorp/consul/agent/connect"
@@ -1213,6 +1214,7 @@ func makeHTTPFilter(opts listenerFilterOpts) (*envoylistener.Filter, error) {
                                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                                Cluster: opts.cluster,
                                        },
+                                       Timeout: &duration.Duration{Seconds: 120},
                                },
                        },
                }

the problem is that it hardcodes a value for all the services, imposible to configure for each upstream service...

I suspect the way to go would be using the respect_expected_rq_timeout but I couldn't make it work

@chrisboulton
Copy link
Contributor

chrisboulton commented Jan 12, 2021

Hey folks - we ran into this today too. I spent some time on it this afternoon.

As y'all have discovered, there's two places the Envoy route request_timeout is tripping:

local_app Request Timeouts

Locally, I have the changes which add local_request_timeout_ms to proxy configurations, similar to how local_connect_timeout_ms can be configured for the local app/public listener. In theory this also will let the same configuration be set at the proxy-defaults level. Both 0s (timeouts disabled, for streaming services), and the original behaviour (unspecified, will result in Envoy using its default) are supported. I'll push the branch once I've done a little more testing, and open a pull request.

Update: PR is here - #9554

Upstream Request Timeouts

After addressing that, I sat down to see what we could do about the timeout on the other end, which is configurable today only via a service-router configuration entry. I started to build in support for request_timeout_ms along side connect_timeout_ms in proxy upstreams because we don't have a service-router defined for each service, but then noticed this in the docs:

Note: The connection timeout for a service should ideally be configured via the connect_timeout field of a service-resolver config entry for the upstream destination service.
from: https://www.consul.io/docs/connect/proxies/envoy#connect_timeout_ms

I'm not sure of HashiCorp's motivation behind that move or change is, but my assumption is that it's probably to centralise the configuration of connection timeouts, and elevate them "above" service upstream definitions, which left me wondering what to do about an upstream request_timeout setting. The way I see it, there are a couple of options:

Do nothing, rely on service-router ServiceRouteDestination.RequestTimeout

The work is already done to allow the timeout to be configurable here, and I believe the intent behind putting it in a service resolver is to allow per HTTP path customisations of timeouts, etc. The downside is that this would mean that any time you wanted to change the default Envoy timeout value of 15s, you would need to define a service-router for your service.

Implement request_timeout_ms for Proxy Upstreams

Per the note from the docs above, and I could do with some guidance or directional advice from our friends at HashiCorp here, this doesn't seem to be the suggested place for configuring this type of thing going forward.

If my misunderstanding about this is incorrect, and a configuration under proxy upstreams for the request timeout is the right place, then great!

Disable the request timeout for Proxy Upstreams (by default)

Maybe an option here is to explicitly set request_timeout = 0 for the upstream route configuration: you can already override it to set an explicit request timeout using a service-router, and with local_app's request timeout now being configurable, is there any downside to this?

respect_expected_rq_timeout Configurability

I was about to pick this up next, but I have the same questions: where would you expect to configure this? Are there any places that you would reasonably expect this option to be enabled? Do we not care as much if we're able to tweak the timeouts above?


Appreciate thoughts and feedback! 🙂

@chrisboulton
Copy link
Contributor

chrisboulton commented Jan 14, 2021

For those that are interested, the pull request which adds support for local_request_timeout_ms can be found here: #9554 (updated in the above comment)

If you're after a complete change set that also disables the request timeout on the egress Envoy instances/upstream routes completely (but still allows overrides via service-router entries) because you want to control timeouts at the local_app layer, then we have a custom fork of v1.9.1 we're currently testing which you can find here: bigcommerce/consul@v1.9.1...v1.9.1-bc -- from limited testing we've done so far, this seems to be working fine.

I'd very much love feedback on my earlier comment for how we can get this functionality into Consul Connect, as I'd consider it critical for a lot of production deployments.

@freddygv
Copy link
Contributor

Hi @chrisboulton I think this could fit in per-upstream configuration, and the new UpstreamConfig in service-defaults:
https://www.consul.io/docs/connect/config-entries/service-defaults#upstreamconfig

Regarding the note about connect timeout, there are two points there. First is that yes we generally want to move away from per-instance configuration. However, if the discovery chain isn't being configured, I can see the benefit of allowing this flag to be set elsewhere. Now that we have centralized upstream config in service-defaults it wouldn't be exclusively in individual proxy registrations.

An issue that concerns me is the proliferation of places where one setting can be configured, and then it becoming unclear which one takes precedence.

@adityavikasd
Copy link

Do nothing, rely on service-router ServiceRouteDestination.RequestTimeouthttps://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax

The work is already done to allow the timeout to be configurable here, and I believe the intent behind putting it in a service resolver is to allow per HTTP path customisations of timeouts, etc. The downside is that this would mean that any time you wanted to change the default Envoy timeout value of 15s, you would need to define a service-router for your service.

@chrisboulton I tried this solution but that doesn't work with Transparent proxy mode enabled. Is there a workaround while using Transparent proxy mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/envoy/xds Related to Envoy support type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests