Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ZooKeeper load balancing #65570

Merged
merged 9 commits into from
Jun 28, 2024
Merged

Improve ZooKeeper load balancing #65570

merged 9 commits into from
Jun 28, 2024

Conversation

tavplubix
Copy link
Member

@tavplubix tavplubix commented Jun 23, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite fallback_session_lifetime. Added support for AZ-aware balancing.

Closes #55110

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing):

  • Allow: All Required Checks
  • Allow: Stateless tests
  • Allow: Stateful tests
  • Allow: Integration Tests
  • Allow: Performance tests
  • Allow: All Builds
  • Allow: batch 1, 2 for multi-batch jobs
  • Allow: batch 3, 4, 5, 6 for multi-batch jobs

  • Exclude: Style check
  • Exclude: Fast test
  • Exclude: All with ASAN
  • Exclude: All with TSAN, MSAN, UBSAN, Coverage
  • Exclude: All with aarch64, release, debug

  • Do not test
  • Woolen Wolfdog
  • Upload binaries for special builds
  • Disable merge-commit
  • Disable CI cache

@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-improvement Pull request with some product improvements label Jun 23, 2024
@robot-clickhouse-ci-2
Copy link
Contributor

robot-clickhouse-ci-2 commented Jun 23, 2024

This is an automated comment for commit 8c4c2b6 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests❌ failure
Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

<!--<zookeeper_load_balancing> random / in_order / nearest_hostname / first_or_random / round_robin </zookeeper_load_balancing>-->
<zookeeper_load_balancing>random</zookeeper_load_balancing>

<client_availability_zone>az2</client_availability_zone>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @thevar1able We already have a configuration section on this at server level. it would be better UX to only need to define this once?

https://github.com/ClickHouse/ClickHouse/pull/59976/files#diff-47e5b6a4c49f50e47acd19ca0765ddd7940da8d3d4e7e846bc25a3f0070e0399R3

Copy link
Member

@thevar1able thevar1able Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We should reuse PlacementInfo here and extend it if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We initialize PlacementInfo too late:

PlacementInfo::PlacementInfo::instance().initialize(config());

ZooKeeper can be initialized here:
if (loaded_config.has_zk_includes)
{
auto old_configuration = loaded_config.configuration;
ConfigProcessor config_processor(config_path);
loaded_config = config_processor.loadConfigWithZooKeeperIncludes(

I will move it earlier, but it will not be possible to use zk includes for PlacementInfo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we still need a setting that enables/disables az-aware balancing in ZooKeeper client

Int8 new_node_idx = new_impl->getConnectedNodeIdx();

/// Maybe the node was unavailable when getting AZs first time, update just in case
if (args.availability_zone_autodetect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small optimization to only update this when the avaiability[new_node_idx].empty()! ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization is too small, but makes sense

if (reconnect_task)
(*reconnect_task)->deactivate();

auto res = std::shared_ptr<ZooKeeper>(new ZooKeeper(args, zk_log, availability_zones, std::move(optimal_impl)));
Copy link
Member

@CheSema CheSema Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works fine when we know all az at the start either from config or from active asking.

Imagine that we do not know az from config.
It works also fine when only one node of tree is unavailable. If that node from other az we connect to the node from local az at the start.
If the node from the local az is unavailable, than our back ground task tracks that one unavailable node which is from the local az because AvailabilityZoneInfo::UNKNOWN < AvailabilityZoneInfo::Other. When that node is online we will connect to it. That is as expected.

But that wont work when we have two or all nodes unavailable at start, then back ground task will tracks some one node. That node could be not from the local az. After back ground task switches us to that node, we are not continue to look further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two or all nodes unavailable

Nothing works at all when 2 of 3 nodes are unavailable. Although your comment still makes sense for 5 nodes, it's not a usual usecase and we don't need it

Coordination::ZooKeeper::Node hostToNode(const LoggerPtr & log, const ShuffleHost & host)
{
/// We want to resolve all hosts without DNS cache for keeper connection.
Coordination::DNSResolver::instance().removeHostFromCache(host.host);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not understand how it helps here.
If we never resolve keeper node with DNSResolver then there is nothing to delete.
Otherwise this only drop the cache only when it is called, it still might be involved between such drops.

This line
const Poco::Net::SocketAddress host_socket_addr{host.host};
do not use DNSResolver for resolving hostname.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was introduced in #50738
We can ask @pufit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's redundant now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then it was redundant from the beginning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot how this PR was merged, but it seems unfinished to me 😅


/// We want to resolve all hosts without DNS cache for keeper connection.
Coordination::DNSResolver::instance().removeHostFromCache(host_string);
return nodes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I have concerns about that function.
It could return not all hosts. As a result we will proceed with the left hosts without even trying recheck deleted hosts.

Imagine if our local az keeper is under recreation. It is delisted from DNS, host has no dns record for some time. We will just proceed without it. Our reconnect logic with optimal_impl wont be triggered. When that node become available no one tries to connect to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function

String ZooKeeper::getConnectedHostAvailabilityZone() const
{
auto idx = impl->getConnectedNodeIdx();
if (idx < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not came up with the case when it could happen. May be throw LOGAL_ERROR if it some kind of impossible case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it can happen with TestKeeper, but it returns 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible when session is expired

@CheSema
Copy link
Member

CheSema commented Jun 25, 2024

In general all OK.
Lets leave the corner cases to the future refactoring. There is no way to safely handle them with the current realisation.

@tavplubix tavplubix requested a review from CheSema June 26, 2024 00:04
@tavplubix tavplubix added this pull request to the merge queue Jun 28, 2024
Merged via the queue into master with commit 4748e29 Jun 28, 2024
242 of 248 checks passed
@tavplubix tavplubix deleted the keeper_az branch June 28, 2024 23:30
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keeper load balancing based on availability zones
8 participants