-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ZooKeeper load balancing #65570
Conversation
This is an automated comment for commit 8c4c2b6 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
<!--<zookeeper_load_balancing> random / in_order / nearest_hostname / first_or_random / round_robin </zookeeper_load_balancing>--> | ||
<zookeeper_load_balancing>random</zookeeper_load_balancing> | ||
|
||
<client_availability_zone>az2</client_availability_zone> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @thevar1able We already have a configuration section on this at server level. it would be better UX to only need to define this once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We should reuse PlacementInfo
here and extend it if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We initialize PlacementInfo
too late:
ClickHouse/programs/server/Server.cpp
Line 1809 in 784f66c
PlacementInfo::PlacementInfo::instance().initialize(config()); |
ZooKeeper can be initialized here:
ClickHouse/programs/server/Server.cpp
Lines 1011 to 1015 in 784f66c
if (loaded_config.has_zk_includes) | |
{ | |
auto old_configuration = loaded_config.configuration; | |
ConfigProcessor config_processor(config_path); | |
loaded_config = config_processor.loadConfigWithZooKeeperIncludes( |
I will move it earlier, but it will not be possible to use zk includes for PlacementInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we still need a setting that enables/disables az-aware balancing in ZooKeeper client
src/Common/ZooKeeper/ZooKeeper.cpp
Outdated
Int8 new_node_idx = new_impl->getConnectedNodeIdx(); | ||
|
||
/// Maybe the node was unavailable when getting AZs first time, update just in case | ||
if (args.availability_zone_autodetect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small optimization to only update this when the avaiability[new_node_idx].empty()!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization is too small, but makes sense
if (reconnect_task) | ||
(*reconnect_task)->deactivate(); | ||
|
||
auto res = std::shared_ptr<ZooKeeper>(new ZooKeeper(args, zk_log, availability_zones, std::move(optimal_impl))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works fine when we know all az at the start either from config or from active asking.
Imagine that we do not know az from config.
It works also fine when only one node of tree is unavailable. If that node from other az we connect to the node from local az at the start.
If the node from the local az is unavailable, than our back ground task tracks that one unavailable node which is from the local az because AvailabilityZoneInfo::UNKNOWN < AvailabilityZoneInfo::Other. When that node is online we will connect to it. That is as expected.
But that wont work when we have two or all nodes unavailable at start, then back ground task will tracks some one node. That node could be not from the local az. After back ground task switches us to that node, we are not continue to look further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two or all nodes unavailable
Nothing works at all when 2 of 3 nodes are unavailable. Although your comment still makes sense for 5 nodes, it's not a usual usecase and we don't need it
src/Common/ZooKeeper/ZooKeeper.cpp
Outdated
Coordination::ZooKeeper::Node hostToNode(const LoggerPtr & log, const ShuffleHost & host) | ||
{ | ||
/// We want to resolve all hosts without DNS cache for keeper connection. | ||
Coordination::DNSResolver::instance().removeHostFromCache(host.host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not understand how it helps here.
If we never resolve keeper node with DNSResolver
then there is nothing to delete.
Otherwise this only drop the cache only when it is called, it still might be involved between such drops.
This line
const Poco::Net::SocketAddress host_socket_addr{host.host};
do not use DNSResolver
for resolving hostname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's redundant now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then it was redundant from the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot how this PR was merged, but it seems unfinished to me 😅
src/Common/ZooKeeper/ZooKeeper.cpp
Outdated
|
||
/// We want to resolve all hosts without DNS cache for keeper connection. | ||
Coordination::DNSResolver::instance().removeHostFromCache(host_string); | ||
return nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I have concerns about that function.
It could return not all hosts. As a result we will proceed with the left hosts without even trying recheck deleted hosts.
Imagine if our local az keeper is under recreation. It is delisted from DNS, host has no dns record for some time. We will just proceed without it. Our reconnect logic with optimal_impl
wont be triggered. When that node become available no one tries to connect to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this function
src/Common/ZooKeeper/ZooKeeper.cpp
Outdated
String ZooKeeper::getConnectedHostAvailabilityZone() const | ||
{ | ||
auto idx = impl->getConnectedNodeIdx(); | ||
if (idx < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not came up with the case when it could happen. May be throw LOGAL_ERROR if it some kind of impossible case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that it can happen with TestKeeper, but it returns 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible when session is expired
In general all OK. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite
fallback_session_lifetime
. Added support for AZ-aware balancing.Closes #55110
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):