Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make over provisioning factor configurable #4003

Merged
merged 10 commits into from
Aug 15, 2018

Conversation

stevenzzzz
Copy link
Contributor

Description:
Move over-provisioning-factor to EDS proto.
Add a over_provisioning_factor() method for HostSet class.

Signed-off-by: Xin Zhuang stevenzzz@google.com

Risk Level: Medium
Testing: Unit tests.
Docs Changes:
Release Notes:
[Optional Fixes #Issue] #3222

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic - thanks for picking this up!

// factor (in percentage). This means that we don't consider a priority
// level or locality unhealthy until the percentage of healthy hosts
// multiplied by the over provisioning factor drops below 100.
google.protobuf.UInt32Value over_provisioning_factor = 3 [(validate.rules).uint32.gt = 0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the math here is a bit confusing, it might be worth linking to load_balancing.rst here

We'll also want to update the docs as part of this PR to note that you can change the default level now that it's not hard-coded any more :-)

https://www.envoyproxy.io/docs/envoy/v1.7.0/intro/arch_overview/load_balancing.html?highlight=priorities#priority-levels

// Priority levels and localities are considered overprovisioned with this
// factor (in percentage). This means that we don't consider a priority
// level or locality unhealthy until the percentage of healthy hosts
// multiplied by the over provisioning factor drops below 100.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I totally created this problem* but can we decide if it's over-provisioned, overprovisioned, or over provisioned?

I'm going to claim via wikipedia (https://en.wikipedia.org/wiki/Overprovisioning) it's one word, so we can try to do that and maybe rename this to overprovisioning_factor?

source/common/upstream/load_balancer_impl.h:// Priority levels and localities are considered overprovisioned
docs/root/intro/arch_overview/load_balancing.rst:Currently, it is assumed that each priority level is over-provisioned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have checked in the fixes, but it's not shown here.

@@ -115,6 +115,21 @@ TEST_P(LoadBalancerBaseTest, PrioritySelection) {
EXPECT_EQ(&tertiary_host_set_, &lb_.chooseHostSet());
}

TEST_P(LoadBalancerBaseTest, OverProvisioningFactor) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove extra whitespace

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, almost there!

:ref:`this percentage <envoy_api_field_ClusterLoadAssignment.Policy.overprovisioning_factor>`.
The default value is 1.4. This means that we don't consider a priority level or
locality unhealthy until the percentage of healthy hosts multiplied by the
overprovisioning factor drops below 100.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm terrible at doing math in my head can I request we spell out the actual percentage here? Maybe something like changing

The default value is 1.4. This means that we don't consider a priority level or
locality unhealthy until the percentage of healthy hosts multiplied by the
overprovisioning factor drops below 100.

to

Envoy doesn't consider a priority level or
locality unhealthy until the percentage of healthy hosts multiplied by the
overprovisioning factor drops below 100. The default value is 1.4, so a priority level or locality will not be considered unhealthy until the percentage of healthy endpoints goes below 72%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks, the doc is much cleaner now.

// multiplied by the overprovisioning factor drops below 100.
// Read more at :ref:`priority levels <arch_overview_load_balancing_priority_levels>` and
// :ref:`localities <arch_overview_load_balancing_locality_weighted_lb>`.
google.protobuf.UInt32Value overprovisioning_factor = 3 [(validate.rules).uint32.gt = 0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and sorry I missed this last time but we should note the default in our docs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

SubsetLoadBalancer::PrioritySubsetImpl::createHostSet(uint32_t priority,
uint32_t overprovisioning_factor) {
UNREFERENCED_PARAMETER(overprovisioning_factor);
// Use original hostset's overprovisioning_factor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider it an error if they're not the same? ASSERT-validate maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may still need UNREFERENCED_PARAMETER for the opt build to compile, since the ASSERT compiles out there.

@alyssawilk
Copy link
Contributor

Also take a look at CI failures when you get a chance?

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo format and build cleanup. I think it's ready for @zuercher to take a pass?

SubsetLoadBalancer::PrioritySubsetImpl::createHostSet(uint32_t priority,
uint32_t overprovisioning_factor) {
UNREFERENCED_PARAMETER(overprovisioning_factor);
// Use original hostset's overprovisioning_factor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may still need UNREFERENCED_PARAMETER for the opt build to compile, since the ASSERT compiles out there.

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look in the morning, but I did have one question to start.

// below 72%.
// Read more at :ref:`priority levels <arch_overview_load_balancing_priority_levels>` and
// :ref:`localities <arch_overview_load_balancing_locality_weighted_lb>`.
google.protobuf.UInt32Value overprovisioning_factor = 3 [(validate.rules).uint32.gt = 0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to allow values less than 100?

It seems like in a couple of places this will trigger load shifting even when all the hosts in a given priority are healthy, which seems counterintuitive to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine if the user wants say servers with less cores to have less requests routed to them, they can tune this parameter to achieve that.

@stevenzzzz
Copy link
Contributor Author

Make the over_provisionih_factor optional parameter, and HostSet adopt the new value if it's provided. Learned from htuch@ that the only path that will update the value is through EDS onConfigUpdate.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for making this work for eds reloads! Couple of new comments for you :-)

@@ -115,17 +115,15 @@ class Admin {
* Executes an admin request with the specified query params. Note: this must
* be called from Envoy's main thread.
*
* @param path the path of the admin URL.
* @param param the query-params passed to the admin request handler.
* @param path_and_query the path and query of the admin URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this snuck in here - looks like there's some files from this change and an rbac change?

@@ -136,14 +146,16 @@ bool EdsClusterImpl::updateHostsPerLocality(const uint32_t priority, const HostV
// out of the locality scheduler, we discover their new weights. We don't currently have a shared
// object for locality weights that we can update here, we should add something like this to
// improve performance and scalability of locality weight updates.
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) ||
if (overprovisioning_factor.has_value() ||
updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it looks like we call this function with the overprovisioning factor if it's present in the config. Can we avoid doing an update of the old and new overprovisioning factors are the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot? if the overprovisioning factor is changed, we have to update the host_set.
on the contrary, if overprovisioning factor doesn't change, but hosts list changed, we have to keep the original behavior.

@@ -136,14 +146,16 @@ bool EdsClusterImpl::updateHostsPerLocality(const uint32_t priority, const HostV
// out of the locality scheduler, we discover their new weights. We don't currently have a shared
// object for locality weights that we can update here, we should add something like this to
// improve performance and scalability of locality weight updates.
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) ||
if (overprovisioning_factor.has_value() ||
updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) ||
locality_weights_map != new_locality_weights_map) {
locality_weights_map = new_locality_weights_map;
ENVOY_LOG(debug, "EDS hosts or locality weights changed for cluster: {} ({}) priority {}",
info_->name(), host_set.hosts().size(), host_set.priority());

priority_state_manager.updateClusterPrioritySet(priority, std::move(current_hosts_copy),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we call updateHostsPerLocality with the overprovisioning factor if it's present in config, can we make sure that if there's no change in hosts and there's no change in overprovisioning factor (it's present but goes from 150->150 or something) that we don't update the cluster priority set? I think as the code is written we don't check to see if it has changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


const HostSetPtr& host_set = original_priority_set_.hostSetsPerPriority()[priority];

UNREFERENCED_PARAMETER(overprovisioning_factor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this viten the new if/ASSERT below.
Alternately you can rewrite the assert so the if is compiled out as well
ASSERT(!overprovisioning_factor.has_value() || overprovisioning_factor.value() == host_set->overprovisioning_factor())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

…sioning_factor, add an integration test in eds_integration_test.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz
Copy link
Contributor Author

To all the reviewers, as a git newbie, I kept "compressing the commit log" to "make the commit log less messy", by "git reset prev-hash && git add . && git commit '
I didn't know this will erase all the commented commits, sorry for the inconvenience.

…ing a overprovisioning factor, assumes the default value

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
alyssawilk
alyssawilk previously approved these changes Aug 13, 2018
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher anything else on your end?

Sync to upstream master HEAD.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Looks like you need fix some formatting in cluster_manager_impl_test.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
sync from upstream.

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
Signed-off-by: Xin Zhuang <stevenzzz@google.com>
@stevenzzzz
Copy link
Contributor Author

@zuercher, PR synced with upstream/master, please take a look at your convenience, thanks!

@alyssawilk alyssawilk merged commit 8c189a5 into envoyproxy:master Aug 15, 2018
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
…#1938)

This is far from finished, but it reduces memory usage by ~10%.

Pulling the following changes from github.com/envoyproxy/envoy:

c1cc68d stats: refactoring MetricImpl without strings (envoyproxy#4190)
36809d8 fuzz: coverage profile generation instructions. (envoyproxy#4193)
ba40cc9 upstream: rebuild cluster when health check config is changed (envoyproxy#4075)
05c0d52 build: use clang-6.0. (envoyproxy#4168)
01f403e thrift_proxy: introduce header transport (envoyproxy#4082)
564d256 tcp: allow connection pool callers to store protocol state (envoyproxy#4131)
3e1d643 configs: match latest API changes (envoyproxy#4185)
bc6a10c Fix wrong mock function name. (envoyproxy#4187)
e994c1c Bump yaml-cpp so it builds with Visual Studio 15.8 (envoyproxy#4182)
3d1325e Converting envoy configs to V2 (envoyproxy#2957)
8d0680f Add timestamp to HealthCheckEvent definition (envoyproxy#4119)
497efb9 server: handle non-EnvoyExceptions safely if thrown in constructor. (envoyproxy#4173)
6d6fafd config: strengthen validation for gRPC config sources. (envoyproxy#4171)
132302c fuzz: reduce log level when running under fuzz engine. (envoyproxy#4161)
7c04ac2 test: fix V6EmptyOptions in coverage with IPv6 enable (envoyproxy#4155)
1b2219b ci: remove deprecated bazel --batch option (envoyproxy#4166)
2db6a4c ci: update clang to version 6.0 in the Ubuntu build image. (envoyproxy#4157)
71152b7 ratelimit: Add ratelimit custom response headers (envoyproxy#4015)
3062874 ssl: make Ssl::Connection const everywhere (envoyproxy#4179)
706e262 Handle validation of non expiring tokens in jwt_authn filter (envoyproxy#4007)
f06e958 fuzz: tag trivial fuzzers with no_fuzz. (envoyproxy#4156)
27fb1d3 thrift_proxy: add service name matching to router implementation (envoyproxy#4130)
8c189a5 Make over provisioning factor configurable (envoyproxy#4003)
6c08fb4 Making test less flaky (envoyproxy#4149)
592775b fuzz: bare bones HCM fuzzer. (envoyproxy#4118)

For istio/istio#7912.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants