-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make over provisioning factor configurable #4003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic - thanks for picking this up!
api/envoy/api/v2/eds.proto
Outdated
// factor (in percentage). This means that we don't consider a priority | ||
// level or locality unhealthy until the percentage of healthy hosts | ||
// multiplied by the over provisioning factor drops below 100. | ||
google.protobuf.UInt32Value over_provisioning_factor = 3 [(validate.rules).uint32.gt = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the math here is a bit confusing, it might be worth linking to load_balancing.rst here
We'll also want to update the docs as part of this PR to note that you can change the default level now that it's not hard-coded any more :-)
api/envoy/api/v2/eds.proto
Outdated
// Priority levels and localities are considered overprovisioned with this | ||
// factor (in percentage). This means that we don't consider a priority | ||
// level or locality unhealthy until the percentage of healthy hosts | ||
// multiplied by the over provisioning factor drops below 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I totally created this problem* but can we decide if it's over-provisioned, overprovisioned, or over provisioned?
I'm going to claim via wikipedia (https://en.wikipedia.org/wiki/Overprovisioning) it's one word, so we can try to do that and maybe rename this to overprovisioning_factor?
source/common/upstream/load_balancer_impl.h:// Priority levels and localities are considered overprovisioned
docs/root/intro/arch_overview/load_balancing.rst:Currently, it is assumed that each priority level is over-provisioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have checked in the fixes, but it's not shown here.
@@ -115,6 +115,21 @@ TEST_P(LoadBalancerBaseTest, PrioritySelection) { | |||
EXPECT_EQ(&tertiary_host_set_, &lb_.chooseHostSet()); | |||
} | |||
|
|||
TEST_P(LoadBalancerBaseTest, OverProvisioningFactor) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove extra whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, almost there!
:ref:`this percentage <envoy_api_field_ClusterLoadAssignment.Policy.overprovisioning_factor>`. | ||
The default value is 1.4. This means that we don't consider a priority level or | ||
locality unhealthy until the percentage of healthy hosts multiplied by the | ||
overprovisioning factor drops below 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm terrible at doing math in my head can I request we spell out the actual percentage here? Maybe something like changing
The default value is 1.4. This means that we don't consider a priority level or
locality unhealthy until the percentage of healthy hosts multiplied by the
overprovisioning factor drops below 100.
to
Envoy doesn't consider a priority level or
locality unhealthy until the percentage of healthy hosts multiplied by the
overprovisioning factor drops below 100. The default value is 1.4, so a priority level or locality will not be considered unhealthy until the percentage of healthy endpoints goes below 72%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks, the doc is much cleaner now.
api/envoy/api/v2/eds.proto
Outdated
// multiplied by the overprovisioning factor drops below 100. | ||
// Read more at :ref:`priority levels <arch_overview_load_balancing_priority_levels>` and | ||
// :ref:`localities <arch_overview_load_balancing_locality_weighted_lb>`. | ||
google.protobuf.UInt32Value overprovisioning_factor = 3 [(validate.rules).uint32.gt = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and sorry I missed this last time but we should note the default in our docs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
source/common/upstream/subset_lb.cc
Outdated
SubsetLoadBalancer::PrioritySubsetImpl::createHostSet(uint32_t priority, | ||
uint32_t overprovisioning_factor) { | ||
UNREFERENCED_PARAMETER(overprovisioning_factor); | ||
// Use original hostset's overprovisioning_factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider it an error if they're not the same? ASSERT-validate maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may still need UNREFERENCED_PARAMETER for the opt build to compile, since the ASSERT compiles out there.
Also take a look at CI failures when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo format and build cleanup. I think it's ready for @zuercher to take a pass?
source/common/upstream/subset_lb.cc
Outdated
SubsetLoadBalancer::PrioritySubsetImpl::createHostSet(uint32_t priority, | ||
uint32_t overprovisioning_factor) { | ||
UNREFERENCED_PARAMETER(overprovisioning_factor); | ||
// Use original hostset's overprovisioning_factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may still need UNREFERENCED_PARAMETER for the opt build to compile, since the ASSERT compiles out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a closer look in the morning, but I did have one question to start.
// below 72%. | ||
// Read more at :ref:`priority levels <arch_overview_load_balancing_priority_levels>` and | ||
// :ref:`localities <arch_overview_load_balancing_locality_weighted_lb>`. | ||
google.protobuf.UInt32Value overprovisioning_factor = 3 [(validate.rules).uint32.gt = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to allow values less than 100?
It seems like in a couple of places this will trigger load shifting even when all the hosts in a given priority are healthy, which seems counterintuitive to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine if the user wants say servers with less cores to have less requests routed to them, they can tune this parameter to achieve that.
Make the over_provisionih_factor optional parameter, and HostSet adopt the new value if it's provided. Learned from htuch@ that the only path that will update the value is through EDS onConfigUpdate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for making this work for eds reloads! Couple of new comments for you :-)
include/envoy/server/admin.h
Outdated
@@ -115,17 +115,15 @@ class Admin { | |||
* Executes an admin request with the specified query params. Note: this must | |||
* be called from Envoy's main thread. | |||
* | |||
* @param path the path of the admin URL. | |||
* @param param the query-params passed to the admin request handler. | |||
* @param path_and_query the path and query of the admin URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this snuck in here - looks like there's some files from this change and an rbac change?
source/common/upstream/eds.cc
Outdated
@@ -136,14 +146,16 @@ bool EdsClusterImpl::updateHostsPerLocality(const uint32_t priority, const HostV | |||
// out of the locality scheduler, we discover their new weights. We don't currently have a shared | |||
// object for locality weights that we can update here, we should add something like this to | |||
// improve performance and scalability of locality weight updates. | |||
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || | |||
if (overprovisioning_factor.has_value() || | |||
updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it looks like we call this function with the overprovisioning factor if it's present in the config. Can we avoid doing an update of the old and new overprovisioning factors are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot? if the overprovisioning factor is changed, we have to update the host_set.
on the contrary, if overprovisioning factor doesn't change, but hosts list changed, we have to keep the original behavior.
@@ -136,14 +146,16 @@ bool EdsClusterImpl::updateHostsPerLocality(const uint32_t priority, const HostV | |||
// out of the locality scheduler, we discover their new weights. We don't currently have a shared | |||
// object for locality weights that we can update here, we should add something like this to | |||
// improve performance and scalability of locality weight updates. | |||
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || | |||
if (overprovisioning_factor.has_value() || | |||
updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || | |||
locality_weights_map != new_locality_weights_map) { | |||
locality_weights_map = new_locality_weights_map; | |||
ENVOY_LOG(debug, "EDS hosts or locality weights changed for cluster: {} ({}) priority {}", | |||
info_->name(), host_set.hosts().size(), host_set.priority()); | |||
|
|||
priority_state_manager.updateClusterPrioritySet(priority, std::move(current_hosts_copy), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we call updateHostsPerLocality with the overprovisioning factor if it's present in config, can we make sure that if there's no change in hosts and there's no change in overprovisioning factor (it's present but goes from 150->150 or something) that we don't update the cluster priority set? I think as the code is written we don't check to see if it has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
source/common/upstream/subset_lb.cc
Outdated
|
||
const HostSetPtr& host_set = original_priority_set_.hostSetsPerPriority()[priority]; | ||
|
||
UNREFERENCED_PARAMETER(overprovisioning_factor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this viten the new if/ASSERT below.
Alternately you can rewrite the assert so the if is compiled out as well
ASSERT(!overprovisioning_factor.has_value() || overprovisioning_factor.value() == host_set->overprovisioning_factor())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…sioning_factor, add an integration test in eds_integration_test. Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
To all the reviewers, as a git newbie, I kept "compressing the commit log" to "make the commit log less messy", by "git reset prev-hash && git add . && git commit ' |
…ing a overprovisioning factor, assumes the default value Signed-off-by: Xin Zhuang <stevenzzz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuercher anything else on your end?
Sync to upstream master HEAD. Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Looks like you need fix some formatting in cluster_manager_impl_test.
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
sync from upstream. Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@zuercher, PR synced with upstream/master, please take a look at your convenience, thanks! |
…#1938) This is far from finished, but it reduces memory usage by ~10%. Pulling the following changes from github.com/envoyproxy/envoy: c1cc68d stats: refactoring MetricImpl without strings (envoyproxy#4190) 36809d8 fuzz: coverage profile generation instructions. (envoyproxy#4193) ba40cc9 upstream: rebuild cluster when health check config is changed (envoyproxy#4075) 05c0d52 build: use clang-6.0. (envoyproxy#4168) 01f403e thrift_proxy: introduce header transport (envoyproxy#4082) 564d256 tcp: allow connection pool callers to store protocol state (envoyproxy#4131) 3e1d643 configs: match latest API changes (envoyproxy#4185) bc6a10c Fix wrong mock function name. (envoyproxy#4187) e994c1c Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182) 3d1325e Converting envoy configs to V2 (envoyproxy#2957) 8d0680f Add timestamp to HealthCheckEvent definition (envoyproxy#4119) 497efb9 server: handle non-EnvoyExceptions safely if thrown in constructor. (envoyproxy#4173) 6d6fafd config: strengthen validation for gRPC config sources. (envoyproxy#4171) 132302c fuzz: reduce log level when running under fuzz engine. (envoyproxy#4161) 7c04ac2 test: fix V6EmptyOptions in coverage with IPv6 enable (envoyproxy#4155) 1b2219b ci: remove deprecated bazel --batch option (envoyproxy#4166) 2db6a4c ci: update clang to version 6.0 in the Ubuntu build image. (envoyproxy#4157) 71152b7 ratelimit: Add ratelimit custom response headers (envoyproxy#4015) 3062874 ssl: make Ssl::Connection const everywhere (envoyproxy#4179) 706e262 Handle validation of non expiring tokens in jwt_authn filter (envoyproxy#4007) f06e958 fuzz: tag trivial fuzzers with no_fuzz. (envoyproxy#4156) 27fb1d3 thrift_proxy: add service name matching to router implementation (envoyproxy#4130) 8c189a5 Make over provisioning factor configurable (envoyproxy#4003) 6c08fb4 Making test less flaky (envoyproxy#4149) 592775b fuzz: bare bones HCM fuzzer. (envoyproxy#4118) For istio/istio#7912. Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Description:
Move over-provisioning-factor to EDS proto.
Add a over_provisioning_factor() method for HostSet class.
Signed-off-by: Xin Zhuang stevenzzz@google.com
Risk Level: Medium
Testing: Unit tests.
Docs Changes:
Release Notes:
[Optional Fixes #Issue] #3222