-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream: rebuild cluster when health check config is changed #4075
Changes from 5 commits
73d654a
9353976
1fb3882
8e257c1
10446cc
96bc0c2
fac7657
965a32c
99943d7
110d339
458d53e
f87cf0e
a940f15
72f31f4
038d114
f3abe36
7d7aea0
0f8d6f9
295a8cb
9dd4941
274f9da
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -864,9 +864,12 @@ bool BaseDynamicClusterImpl::updateDynamicHostList(const HostVector& new_hosts, | |
|
||
bool found = false; | ||
for (auto i = current_hosts.begin(); i != current_hosts.end();) { | ||
const bool address_matched = *(*i)->address() == *host->address(); | ||
const bool health_check_changed = | ||
health_checker_ != nullptr && *(*i)->healthCheckAddress() != *host->healthCheckAddress(); | ||
// If we find a host matched based on address, we keep it. However we do change weight inline | ||
// so do that here. | ||
if (*(*i)->address() == *host->address()) { | ||
if (address_matched && !health_check_changed) { | ||
if (host->weight() > max_host_weight) { | ||
max_host_weight = host->weight(); | ||
} | ||
|
@@ -909,6 +912,9 @@ bool BaseDynamicClusterImpl::updateDynamicHostList(const HostVector& new_hosts, | |
i = current_hosts.erase(i); | ||
found = true; | ||
} else { | ||
if (address_matched && health_check_changed) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry little confused, why is this change necessary? Same below? Seems like just pivoting on address + HC address should be enough above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried by only introducing the change above. But from my test, without this, it leads to keeping the old hosts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, can you take a look at why that is? This code is getting really complicated and could use some more comments anyway. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. Will try to trace out the flow and craft some comments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Referring to the case example by @bplotnick in here: https://github.com/bplotnick/envoy_examples/blob/master/healthcheck_updates/envoy_config/clusters.yaml, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These two changes make sure, regardless the flag, we always rebuild and skip keeping healthy (old/current) hosts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heh whoops. That was actually unintentional. In my original configuration where I saw this issue, I had that flag set 😛 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😄 yeah, in that case, we only need to add https://github.com/envoyproxy/envoy/pull/4075/files#diff-cd280606785949980237b2a7f08b7885R872 as suggested by @mattklein123. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But probably that's the correct behavior? Once the old hosts die, it will be removed eventually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need a little time to grok this conversation thread. I will follow up today or tomorrow. Thank you! |
||
hosts_changed = true; | ||
} | ||
i++; | ||
} | ||
} | ||
|
@@ -931,7 +937,7 @@ bool BaseDynamicClusterImpl::updateDynamicHostList(const HostVector& new_hosts, | |
const bool dont_remove_healthy_hosts = | ||
health_checker_ != nullptr && !info()->drainConnectionsOnHostRemoval(); | ||
// If there are removed hosts, check to see if we should only delete if unhealthy. | ||
if (!current_hosts.empty() && dont_remove_healthy_hosts) { | ||
if (!current_hosts.empty() && dont_remove_healthy_hosts && !hosts_changed) { | ||
for (auto i = current_hosts.begin(); i != current_hosts.end();) { | ||
if (!(*i)->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)) { | ||
if ((*i)->weight() > max_host_weight) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1059,6 +1059,99 @@ TEST_F(EdsTest, PriorityAndLocalityWeighted) { | |
EXPECT_EQ(1UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
} | ||
|
||
TEST_F(EdsTest, EndpointUpdateHealthCheckConfig) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguably, you could subclass the test fixture |
||
resetCluster(R"EOF( | ||
name: name | ||
connect_timeout: 0.25s | ||
type: EDS | ||
lb_policy: ROUND_ROBIN | ||
drain_connections_on_host_removal: true | ||
eds_cluster_config: | ||
service_name: fare | ||
eds_config: | ||
api_config_source: | ||
cluster_names: | ||
- eds | ||
refresh_delay: 1s | ||
)EOF"); | ||
|
||
auto health_checker = std::make_shared<MockHealthChecker>(); | ||
EXPECT_CALL(*health_checker, start()); | ||
EXPECT_CALL(*health_checker, addHostCheckCompleteCb(_)).Times(2); | ||
cluster_->setHealthChecker(health_checker); | ||
|
||
Protobuf::RepeatedPtrField<envoy::api::v2::ClusterLoadAssignment> resources; | ||
auto* cluster_load_assignment = resources.Add(); | ||
cluster_load_assignment->set_cluster_name("fare"); | ||
|
||
auto add_endpoint = [cluster_load_assignment](int port) { | ||
auto* endpoints = cluster_load_assignment->add_endpoints(); | ||
|
||
auto* socket_address = endpoints->add_lb_endpoints() | ||
->mutable_endpoint() | ||
->mutable_address() | ||
->mutable_socket_address(); | ||
socket_address->set_address("1.2.3.4"); | ||
socket_address->set_port_value(port); | ||
}; | ||
|
||
auto update_health_check_port = [cluster_load_assignment](const uint32_t index, | ||
const uint32_t port) { | ||
cluster_load_assignment->mutable_endpoints(index) | ||
->mutable_lb_endpoints(0) | ||
->mutable_endpoint() | ||
->mutable_health_check_config() | ||
->set_port_value(port); | ||
}; | ||
|
||
add_endpoint(80); | ||
add_endpoint(81); | ||
|
||
VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
// Make sure the custer is rebuilt. | ||
EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
||
{ | ||
auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
EXPECT_EQ(hosts.size(), 2); | ||
|
||
EXPECT_TRUE(hosts[0]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
EXPECT_TRUE(hosts[1]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
// Mark the hosts as healthy | ||
hosts[0]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
hosts[1]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
} | ||
|
||
const uint32_t new_health_check_port = 8000; | ||
update_health_check_port(0, new_health_check_port); | ||
|
||
VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
||
{ | ||
auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
EXPECT_EQ(hosts.size(), 2); | ||
// Make sure the first endpoint health check port is updated. | ||
EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
||
EXPECT_NE(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
} | ||
|
||
update_health_check_port(1, new_health_check_port); | ||
|
||
VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
||
{ | ||
auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
EXPECT_EQ(hosts.size(), 2); | ||
EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
||
// Make sure the second endpoint health check port is updated. | ||
EXPECT_EQ(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
} | ||
} | ||
|
||
// Throw on adding a new resource with an invalid endpoint (since the given address is invalid). | ||
TEST_F(EdsTest, MalformedIP) { | ||
Protobuf::RepeatedPtrField<envoy::api::v2::ClusterLoadAssignment> resources; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not your fault, but this entire method is completely unreadable to me; it's hard to keep track of what
current_hosts
represent, which hosts are to be removed, under which conditions we're going to do a rebuild, inplace modify or remove/add. Do you think there's a way to explain to the layman how this works? Each time I reread this method, by head hurts more and more (and I'm partly to blame for sure).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree. This has been an iterative process over time and I agree it's now not very readable and it hurts my head also. I think it would be worth it to have a lot more comments in here, possibly variable name changes, and maybe even functions broken out. @dio since you are in here now do you mind taking a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, love to have a look 🙂. Will spend some time dive into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is a quite intensive refactoring in here: #3959