-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediately drain connections from hosts not in service discovery, as an option #440
Labels
Comments
Am I correct that this behavior could be modified in upstream_impl possibly to read a new property of cluster? |
htuch
pushed a commit
that referenced
this issue
May 8, 2018
This PR adds a configuration flag that allows disabling the "eventually consistent" aspect of endpoint updates: instead of waiting for the endpoints to go unhealthy before removing them from the cluster, do so immediately. This gives greater control to the control plane in cases where one might want to divert traffic from an endpoint without having it go unhealthy. The flag goes on the cluster and so applies to all endpoints within that cluster. Risk Level: Low: Small configurable feature which reuses existing behavior (identical to the behavior when no health checker is configured). Defaults to disabled, so should have no impact on existing configurations. Testing: Added unit test for the case when endpoints are healthy then removed from the ClusterLoadAssignment in a subsequent config update. Docs Changes: Docs were added to the proto field. Release Notes: Added cluster: Add :ref:`option <envoy_api_field_Clister.drain_connections_on_eds_removal>` to drain endpoints after they are removed through EDS, despite health status. to the release notes. [Optional Fixes #Issue] #440 and #3276 (note that this last issue also asks for more fine grained control over endpoint removal. The solution this PR provides was brought up as a partial solution to #3276). Signed-off-by: Snow Pettersen <snowp@squareup.com>
I believe this is fixed by #3302. LMK if not and we can reopen. |
rshriram
pushed a commit
to rshriram/envoy
that referenced
this issue
Oct 30, 2018
* Add support files to test LDS manually. * fix indent.
PiotrSikora
added a commit
to PiotrSikora/envoy
that referenced
this issue
Mar 3, 2020
This change restores Wasm Service, which was accidentally removed in a bad merge (envoyproxy#321). It also fixes a number of issuses: 1. Wasm Services were started before Cluster Manager was initialized, which crashed Envoy when Wasm Service tried to dispatch a callout. 2. Wasm Services (per-thread) were not stored and they were getting out of scope, so they were immediately destroyed. 3. Wasm Services (singleton) were started twice. Signed-off-by: Piotr Sikora <piotrsikora@google.com>
istio-testing
pushed a commit
to istio/envoy
that referenced
this issue
Mar 3, 2020
This change restores Wasm Service, which was accidentally removed in a bad merge (#321). It also fixes a number of issuses: 1. Wasm Services were started before Cluster Manager was initialized, which crashed Envoy when Wasm Service tried to dispatch a callout. 2. Wasm Services (per-thread) were not stored and they were getting out of scope, so they were immediately destroyed. 3. Wasm Services (singleton) were started twice. Signed-off-by: Piotr Sikora <piotrsikora@google.com>
jplevyak
pushed a commit
to jplevyak/envoy
that referenced
this issue
Apr 13, 2020
This change restores Wasm Service, which was accidentally removed in a bad merge (istio#321). It also fixes a number of issuses: 1. Wasm Services were started before Cluster Manager was initialized, which crashed Envoy when Wasm Service tried to dispatch a callout. 2. Wasm Services (per-thread) were not stored and they were getting out of scope, so they were immediately destroyed. 3. Wasm Services (singleton) were started twice. Signed-off-by: Piotr Sikora <piotrsikora@google.com>
wolfguoliang
pushed a commit
to wolfguoliang/envoy
that referenced
this issue
Jan 23, 2021
jpsim
pushed a commit
that referenced
this issue
Nov 28, 2022
Description: the liveliness CI tests for android are flaky because the response view holder doesn't log with high fidelity without UI interaction. This PR changes the logs to be printed in the main activity. Risk Level: low Testing: CI Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
jpsim
pushed a commit
that referenced
this issue
Nov 29, 2022
Description: the liveliness CI tests for android are flaky because the response view holder doesn't log with high fidelity without UI interaction. This PR changes the logs to be printed in the main activity. Risk Level: low Testing: CI Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a host is removed from service discovery Envoy will still send it traffic until the healthcheck goes bad. We'd want a configuration to have it start draining connections immediately. Probably along with a minimum-hosts configuration as a sanity check if service-discovery ever lost its mind.
Our discovery always shows all instances, but there's an 'active' flag on whether to send traffic or not, a notion of "in the load balancer". That is used to filter generated HAProxy configurations or other load balancers. For envoy SDS I just omit any host that isn't 'active', so an option would be to have a similar 'active' flag on the SDS API vs an "obey service discovery" on the cluster config.
In a normal deployment we leave up the old fleet for a time to be able to quickly switch back in case of problems. The old fleet is taken out of load balancer but left alive. If monitoring finds a box going bad or rebooted it is also taken out of the load balancer out to be replaced/investigated. In both cases we don't want any customer traffic going to those hosts. I understand there's a mechanism to force-fail the envoy healthcheck but it'd be nice not to have to add that when service-discovery is already saying not to send traffic there.
The text was updated successfully, but these errors were encountered: