Improve ZooKeeper load balancing #65570

tavplubix · 2024-06-23T00:10:51Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite fallback_session_lifetime. Added support for AZ-aware balancing.

Closes #55110

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing):

Allow: All Required Checks
Allow: Stateless tests
Allow: Stateful tests
Allow: Integration Tests
Allow: Performance tests
Allow: All Builds
Allow: batch 1, 2 for multi-batch jobs
Allow: batch 3, 4, 5, 6 for multi-batch jobs

Exclude: Style check
Exclude: Fast test
Exclude: All with ASAN
Exclude: All with TSAN, MSAN, UBSAN, Coverage
Exclude: All with aarch64, release, debug

robot-clickhouse-ci-2 · 2024-06-23T00:12:09Z

This is an automated comment for commit 8c4c2b6 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	❌ failure

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
Builds	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

src/Common/GetPriorityForLoadBalancing.cpp

src/Common/GetPriorityForLoadBalancing.h

src/Common/ZooKeeper/ZooKeeper.cpp

incfly · 2024-06-24T16:29:11Z

tests/integration/test_zookeeper_config_load_balancing/configs/zookeeper_load_balancing2.xml

+        <!--<zookeeper_load_balancing> random / in_order / nearest_hostname / first_or_random / round_robin </zookeeper_load_balancing>-->
+        <zookeeper_load_balancing>random</zookeeper_load_balancing>
+
+        <client_availability_zone>az2</client_availability_zone>


cc @thevar1able We already have a configuration section on this at server level. it would be better UX to only need to define this once?

https://github.com/ClickHouse/ClickHouse/pull/59976/files#diff-47e5b6a4c49f50e47acd19ca0765ddd7940da8d3d4e7e846bc25a3f0070e0399R3

Indeed. We should reuse PlacementInfo here and extend it if necessary.

We initialize PlacementInfo too late:

ClickHouse/programs/server/Server.cpp

Line 1809 in 784f66c

PlacementInfo::PlacementInfo::instance().initialize(config());

ZooKeeper can be initialized here:

ClickHouse/programs/server/Server.cpp

Lines 1011 to 1015 in 784f66c

if (loaded_config.has_zk_includes)

{

auto old_configuration = loaded_config.configuration;

ConfigProcessor config_processor(config_path);

loaded_config = config_processor.loadConfigWithZooKeeperIncludes(

I will move it earlier, but it will not be possible to use zk includes for PlacementInfo

Also, we still need a setting that enables/disables az-aware balancing in ZooKeeper client

incfly · 2024-06-24T16:39:05Z

src/Common/ZooKeeper/ZooKeeper.cpp

+                    Int8 new_node_idx = new_impl->getConnectedNodeIdx();
+
+                    /// Maybe the node was unavailable when getting AZs first time, update just in case
+                    if (args.availability_zone_autodetect)


small optimization to only update this when the avaiability[new_node_idx].empty()! ?

This optimization is too small, but makes sense

src/Common/ZooKeeper/ZooKeeperImpl.cpp

CheSema · 2024-06-25T13:29:31Z

src/Common/ZooKeeper/ZooKeeper.cpp

+    if (reconnect_task)
+        (*reconnect_task)->deactivate();
+
+    auto res = std::shared_ptr<ZooKeeper>(new ZooKeeper(args, zk_log, availability_zones, std::move(optimal_impl)));


That works fine when we know all az at the start either from config or from active asking.

Imagine that we do not know az from config.
It works also fine when only one node of tree is unavailable. If that node from other az we connect to the node from local az at the start.
If the node from the local az is unavailable, than our back ground task tracks that one unavailable node which is from the local az because AvailabilityZoneInfo::UNKNOWN < AvailabilityZoneInfo::Other. When that node is online we will connect to it. That is as expected.

But that wont work when we have two or all nodes unavailable at start, then back ground task will tracks some one node. That node could be not from the local az. After back ground task switches us to that node, we are not continue to look further.

two or all nodes unavailable

Nothing works at all when 2 of 3 nodes are unavailable. Although your comment still makes sense for 5 nodes, it's not a usual usecase and we don't need it

src/Common/ZooKeeper/ZooKeeper.cpp

CheSema · 2024-06-25T13:31:31Z

src/Common/ZooKeeper/ZooKeeper.cpp

+Coordination::ZooKeeper::Node hostToNode(const LoggerPtr & log, const ShuffleHost & host)
+{
+    /// We want to resolve all hosts without DNS cache for keeper connection.
+    Coordination::DNSResolver::instance().removeHostFromCache(host.host);


Do not understand how it helps here.
If we never resolve keeper node with DNSResolver then there is nothing to delete.
Otherwise this only drop the cache only when it is called, it still might be involved between such drops.

This line
const Poco::Net::SocketAddress host_socket_addr{host.host};
do not use DNSResolver for resolving hostname.

This change was introduced in #50738
We can ask @pufit

Yes, it's redundant now

But then it was redundant from the beginning

Yes, I forgot how this PR was merged, but it seems unfinished to me 😅

src/Common/GetPriorityForLoadBalancing.h

CheSema · 2024-06-25T13:38:22Z

src/Common/ZooKeeper/ZooKeeper.cpp


-                /// We want to resolve all hosts without DNS cache for keeper connection.
-                Coordination::DNSResolver::instance().removeHostFromCache(host_string);
+    return nodes;


Also I have concerns about that function.
It could return not all hosts. As a result we will proceed with the left hosts without even trying recheck deleted hosts.

Imagine if our local az keeper is under recreation. It is delisted from DNS, host has no dns record for some time. We will just proceed without it. Our reconnect logic with optimal_impl wont be triggered. When that node become available no one tries to connect to it.

I removed this function

CheSema · 2024-06-25T13:57:49Z

src/Common/ZooKeeper/ZooKeeper.cpp

+String ZooKeeper::getConnectedHostAvailabilityZone() const
+{
+    auto idx = impl->getConnectedNodeIdx();
+    if (idx < 0)


Could not came up with the case when it could happen. May be throw LOGAL_ERROR if it some kind of impossible case?

I thought that it can happen with TestKeeper, but it returns 0

This is possible when session is expired

CheSema · 2024-06-25T14:02:06Z

In general all OK.
Lets leave the corner cases to the future refactoring. There is no way to safely handle them with the current realisation.

programs/keeper/Keeper.cpp

improve ZooKeeper load balancing

dbdf4e1

robot-clickhouse-ci-2 added the pr-improvement Pull request with some product improvements label Jun 23, 2024

tavplubix added 4 commits June 23, 2024 02:47

fix build

5673446

fix

6b994b8

fix

7576bb2

fix tests

3423a55

devcrafter reviewed Jun 24, 2024

View reviewed changes

src/Common/GetPriorityForLoadBalancing.cpp Show resolved Hide resolved

src/Common/GetPriorityForLoadBalancing.h Outdated Show resolved Hide resolved

CheSema self-assigned this Jun 24, 2024

alexey-milovidov mentioned this pull request Jun 24, 2024

Refactor ZooKeeper connection logic out into ZooKeeperLoadBalancer. #56563

Closed

address review comments

614e985

tavplubix force-pushed the keeper_az branch from 56e0fc1 to 614e985 Compare June 24, 2024 15:51

incfly reviewed Jun 24, 2024

View reviewed changes

src/Common/ZooKeeper/ZooKeeper.cpp Show resolved Hide resolved

incfly reviewed Jun 24, 2024

View reviewed changes

src/Common/ZooKeeper/ZooKeeperImpl.cpp Show resolved Hide resolved

CheSema reviewed Jun 25, 2024

View reviewed changes

src/Common/ZooKeeper/ZooKeeper.cpp Outdated Show resolved Hide resolved

CheSema reviewed Jun 25, 2024

View reviewed changes

src/Common/GetPriorityForLoadBalancing.h Outdated Show resolved Hide resolved

CheSema reviewed Jun 25, 2024

View reviewed changes

remove trash

43cb2f6

tavplubix requested a review from CheSema June 26, 2024 00:04

thevar1able reviewed Jun 26, 2024

View reviewed changes

programs/keeper/Keeper.cpp Show resolved Hide resolved

fix

fa108fe

CheSema approved these changes Jun 27, 2024

View reviewed changes

fix

8c4c2b6

tavplubix added this pull request to the merge queue Jun 28, 2024

Merged via the queue into master with commit 4748e29 Jun 28, 2024
242 of 248 checks passed

tavplubix deleted the keeper_az branch June 28, 2024 23:30

robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 28, 2024

Felixoid mentioned this pull request Sep 2, 2024

zk/keeper restarts cause zk watch to be unbalanced #65750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ZooKeeper load balancing #65570

Improve ZooKeeper load balancing #65570

tavplubix commented Jun 23, 2024 •

edited

Loading

robot-clickhouse-ci-2 commented Jun 23, 2024 •

edited by robot-clickhouse

Loading

incfly Jun 24, 2024

thevar1able Jun 24, 2024 •

edited

Loading

tavplubix Jun 25, 2024

tavplubix Jun 25, 2024

incfly Jun 24, 2024

tavplubix Jun 25, 2024

CheSema Jun 25, 2024 •

edited

Loading

tavplubix Jun 25, 2024

CheSema Jun 25, 2024

tavplubix Jun 25, 2024

pufit Jun 26, 2024

tavplubix Jun 26, 2024

pufit Jun 26, 2024

CheSema Jun 25, 2024

tavplubix Jun 26, 2024

CheSema Jun 25, 2024

tavplubix Jun 25, 2024

tavplubix Jun 25, 2024

CheSema commented Jun 25, 2024

	if (loaded_config.has_zk_includes)
	{
	auto old_configuration = loaded_config.configuration;
	ConfigProcessor config_processor(config_path);
	loaded_config = config_processor.loadConfigWithZooKeeperIncludes(

Improve ZooKeeper load balancing #65570

Improve ZooKeeper load balancing #65570

Conversation

tavplubix commented Jun 23, 2024 • edited Loading

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI Settings (Only check the boxes if you know what you are doing):

robot-clickhouse-ci-2 commented Jun 23, 2024 • edited by robot-clickhouse Loading

Choose a reason for hiding this comment

thevar1able Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema commented Jun 25, 2024

tavplubix commented Jun 23, 2024 •

edited

Loading

robot-clickhouse-ci-2 commented Jun 23, 2024 •

edited by robot-clickhouse

Loading

thevar1able Jun 24, 2024 •

edited

Loading

CheSema Jun 25, 2024 •

edited

Loading