Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with Zookeeper 3.9 #53749

Closed
Algunenano opened this issue Aug 23, 2023 · 13 comments · Fixed by #57479
Closed

Incompatibility with Zookeeper 3.9 #53749

Algunenano opened this issue Aug 23, 2023 · 13 comments · Fixed by #57479
Labels

Comments

@Algunenano
Copy link
Member

It seems ZK 3.9 has changed something in its protocol and ClickHouse can't connect to it.

The error seems to be related to the handshake:

2023.08.23 13:11:59.885984 [ 422494 ] {} <Error> virtual bool DB::DDLWorker::initializeMainThread(): Code: 999. Coordination::Exception: Connection loss, path: All connection tries failed while connecting to ZooKeeper. nodes: 127.0.0.1:12183, 127.0.0.1:12181, 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
. (KEEPER_EXCEPTION), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000e1fc3f5 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
1. Coordination::Exception::Exception(String const&, Coordination::Error, int) @ 0x0000000015220571 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
2. Coordination::Exception::Exception(Coordination::Error, String const&) @ 0x0000000015220c6d in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
3. Coordination::ZooKeeper::connect(std::vector<Coordination::ZooKeeper::Node, std::allocator<Coordination::ZooKeeper::Node>> const&, Poco::Timespan) @ 0x000000001527030e in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
4. Coordination::ZooKeeper::ZooKeeper(std::vector<Coordination::ZooKeeper::Node, std::allocator<Coordination::ZooKeeper::Node>> const&, zkutil::ZooKeeperArgs const&, std::shared_ptr<DB::ZooKeeperLog>) @ 0x000000001526dccd in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
5. zkutil::ZooKeeper::init(zkutil::ZooKeeperArgs) @ 0x0000000015223553 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
6. zkutil::ZooKeeper::ZooKeeper(Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::ZooKeeperLog>) @ 0x00000000152270c3 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
7. DB::Context::getZooKeeper() const @ 0x0000000012f73dcc in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
8. DB::DDLWorker::getAndSetZooKeeper() @ 0x0000000012fdfa8d in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
9. DB::DDLWorker::initializeMainThread() @ 0x0000000012ff2c6c in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
10. DB::DDLWorker::runMainThread() @ 0x0000000012fdd771 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
11. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000012ff3dc9 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
12. ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000e2d1a74 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
13. ? @ 0x000000000e2d7281 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
14. ? @ 0x00007f4708c8c9eb in ?
15. ? @ 0x00007f4708d10dfc in ?
 (version 23.6.1.1524 (official build))

ZK 3.8.2 is fine.
Keeper is fine too.

@chhetripradeep
Copy link
Contributor

I think it is related to this change in zk v3.9 https://issues.apache.org/jira/browse/ZOOKEEPER-4492

More detail in the PR: apache/zookeeper#1837 (comment)

Slach added a commit to Altinity/clickhouse-backup that referenced this issue Aug 28, 2023
…ickhouse and zookeeper 3.9.0, see details in apache/zookeeper#1837 (comment) return `:latest` default value after resolve ClickHouse/ClickHouse#53749
@alexey-milovidov
Copy link
Member

Cool kids use ClickHouse Keeper.

@bputt-e
Copy link

bputt-e commented Sep 13, 2023

Cool kids use ClickHouse Keeper.

We would but we had stability issues and we believe it was from the --force-recovery logic https://github.com/Altinity/clickhouse-operator/blob/master/deploy/clickhouse-keeper/clickhouse-keeper-3-nodes.yaml#L175

@Slach
Copy link
Contributor

Slach commented Sep 13, 2023

@bputt-e sorry, but clickhouse-keeper manifests it not related to clickouse-keeper itself
we trying just create manifest, this is not complete now
let's continue discuss in Altinity/clickhouse-operator#1234

@alexey-milovidov
many other things which not allow use clickhouse-keeper in kubernetes for scale-up / scale-down scenarios right now
#53481
#54129

root reason is how eBay/NuRaft store quorum peers and how it update quorum state

@alexey-milovidov
Copy link
Member

@bputt-e, you are pointing to a third-party ClickHouse operator from altinity, which is unrelated to ClickHouse - it can contain mistakes. And having --force-recovery in the operator is 100% a mistake. Do not use this operator with Keeper.

@alexey-milovidov
Copy link
Member

@Slach

many other things which not allow use clickhouse-keeper in kubernetes for scale-up / scale-down scenarios right now
#53481
#54129

You don't need to do any fancy stuff with Keeper. It is a very simple software. If you want to scale up - scale the server/pod up and that's it.

@alexey-milovidov alexey-milovidov added the st-wontfix Known issue, no plans to fix it currenlty label Sep 13, 2023
@alexey-milovidov
Copy link
Member

@Slach you are pointing to a new reconfig command, implemented by a third-party contributor, but incompletely:
#49450

I have no idea why someone needs this command. ClickHouse Keeper works perfectly without a reconfig request. We don't need it. If the existence of this incomplete implementation bothers you, I can remove it.

@Slach
Copy link
Contributor

Slach commented Sep 13, 2023

@alexey-milovidov
sorry, could you explain, how incompatibility with zookeeper 3.9 relates to reconfig?
could you please reopen this issue?

Could you explain how to create cluster with 3 clickouse-keeper nodes and scale down it to just 1 clickhouse-keeper node?
Without reconfig and without --force-recovery?

Because even when you change XML config with <raft_configuration> it will have no effect in clickhouse-keeper and no one keeper node will start without achieve quorum for 3 nodes.

Please don't remove reconfig this is usefull functionality, just complete sync applying raft_configuration changes during reconfig #53481 command.

@UnamedRus
Copy link
Contributor

UnamedRus commented Sep 13, 2023

I would probably ask about another completely reasonable scenario:

2 DC: A & B

3 keeper nodes in DC A (they participate in quorum)
3 keeper nodes in DC B (they only listen to changes, analogue of observer in ZK)

And what if DC A is completely down and we need to switch to DC B, so do reconfigure keeper nodes in DC B, without quorum being up.

It's quite common approach for companies, which value their data and ability to survive any cataclysm.
May be they are not cool kids, but at least they do care about their data.

We don't need it.

Until first disaster?

@alexey-milovidov
Copy link
Member

Switching leaders without a quorum can lead to data loss (of the data that was present in the unavailable datacenter).

A bulletproof approach is to have three Keeper nodes in three different data centers, but not too far from each other (say, less than 30 ms RTT).

An approach when you switch the leader manually makes sense, but only when you can accept data loss - it is similar to, say, changing the master in MySQL replication (a source of many horror stories, especially if done with some automation).

@UnamedRus
Copy link
Contributor

(of the data that was present in the unavailable datacenter).

Datacenter is already gone, so at least temporary, but this DC specific data is already lost from user perspective. Plus learners, should be pretty up to date with latest changes in keeper, much better than ClickHouse replication (just because of data size)
https://github.com/eBay/NuRaft/blob/99eeef34a2620686e0dd40ad7fbd5cab561140fc/docs/readonly_member.md?plain=1

but not too far from each other (say, less than 30 ms RTT).

30 ms RTT is too much for quorum for my taste.

but only when you can accept data loss

Normal replication in ClickHouse is also for people, who can accept data loss. (no quorum during write/async replication)

@tavplubix
Copy link
Member

Could you explain how to create cluster with 3 clickouse-keeper nodes and scale down it to just 1 clickhouse-keeper node?

Don't do this, it's an antipattern. Single-node [Zoo]Keeper clusters are good for dev/staging env, but I would not recommend it for production.

reconfigure keeper nodes in DC B, without quorum being up
they do care about their data

They do not care about their data if they reconfigure a coordination service forcefully without a quorum.

@tavplubix
Copy link
Member

tavplubix commented Sep 14, 2023

But I agree that reconfig and --force-recovery are not related to this issue. If you have something to say regarding ClickHouse Keeper usability, then please create another issue, and let's continue the discussion there. Off-topic comments may be removed.

As for the incompatibility with ZooKeeper 3.9, it's a minor issue because:

  • we have ClickHouse Keeper
  • you can just postpone upgrading your ZooKeeper clusters for a while, there's no way it's urgent

So we can reopen this issue and hope that some good person from the community will send us a PR

@tavplubix tavplubix reopened this Sep 14, 2023
@tavplubix tavplubix added help wanted minor Priority: minor and removed st-wontfix Known issue, no plans to fix it currenlty labels Sep 14, 2023
minguyen9988 added a commit to minguyen9988/clickhouse-backup that referenced this issue Sep 28, 2023
* add connection to gcs and use different context for upload incase it got cancel by another thread

* save

* keep ctx

* keep ctx

* use v2

* change to GCS_CLIENT_POOL_SIZE

* pin zookeeper to 3.8.2 version for resolve incompatibility between clickhouse and zookeeper 3.9.0, see details in apache/zookeeper#1837 (comment) return `:latest` default value after resolve ClickHouse/ClickHouse#53749

* Revert "add more precise disk re-balancing for not exists disks, during download, partial fix Altinity#561"

This reverts commit 20e250c.

* fix S3 head object Server Side Encryption parameters, fix Altinity#709

* change timeout to 60m, TODO make tests Parallel

---------

Co-authored-by: Slach <bloodjazman@gmail.com>
mkmkme added a commit to mkmkme/ClickHouse that referenced this issue Dec 4, 2023
This commit enables the read-only flag when connecting to the ZooKeeper server.

This flag is enabled by sending one extra byte when connecting,
and then receiving one extra byte during the first response.

In addition to that, we modify createIfNotExists to not complain
about attempting to alter a read-only ZooKeeper cluster if the node
already exists.

This makes ClickHouse more useful in the event of a loss of quorum,
user credentials are still accessible, which makes it possible to
connect to the cluster and run read queries.

Any DDL or DML query on a Distributed database or ReplicatedMergeTree
table will correctly fail, since it needs to write to ZooKeeper to
execute the query.

Any non-distributed query will be possible, which is ok since the
query was never replicated in the first place, there is no loss of
consistency.

Fixes ClickHouse#53749 as it seems to be the only thing 3.9 enforced.
vincentbernat added a commit to akvorado/akvorado that referenced this issue Apr 4, 2024
There is an incompatibility of ClickHouse with Zookeeper 3.9. See:

- apache/zookeeper#2146
- apache/zookeeper#1837
- ClickHouse/ClickHouse#53749
vincentbernat added a commit to akvorado/akvorado that referenced this issue Apr 4, 2024
There is an incompatibility of ClickHouse with Zookeeper 3.9. See:

- apache/zookeeper#2146
- apache/zookeeper#1837
- ClickHouse/ClickHouse#53749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants