feat(sharding): add command to sync tables onto new nodes #8912

macobo · 2022-03-08T07:13:40Z

clickhouse-operator only syncs some tables onto new nodes. This new
command ensures that when adding new shards, they are automatically
synced up on redeploying

Note that there might be timing concerns here as resharding on altinity
cloud does not redeploy automatically. In practice however what this
means is that new nodes just won't ingest any data until another deploy

Closes #8904, related to #8652

How did you test this code?

Custom-built an image and deployed it in a sharded setting.

guidoiaquinti

LGTM and few non-blocking comments:

should we add a specific test for scenarios in sync_replicated_schema?
should we handle as well the case of ghost tables? (e.g. tables that were dropped on some nodes but not on all of them)

macobo · 2022-03-08T08:29:21Z

should we handle as well the case of ghost tables? (e.g. tables that were dropped on some nodes but not on all of them)

Automatically dropping client data is a huge no-no in my book so not something I want to touch with a 20-foot pole. This is not an issue right now IMO.

should we add a specific test for scenarios in sync_replicated_schema?

Not really given our current CI setup or without mocking the DB. The mocking route wouldn't really give us anything either. Given how light-weight this script is, I don't feel too uncomfortable with this as-is, but do argue if you disagree!

hazzadous

I think it's worth adding a test here given it's critical for production deployments that this runs as expected.

The other comments are non-blocking

ee/management/commands/sync_replicated_schema.py

hazzadous · 2022-03-08T08:41:05Z

ee/management/commands/sync_replicated_schema.py

+
+            logger.info("Creating missing tables")
+            for query in CREATE_TABLE_QUERIES:
+                sync_execute(build_query(query))


I assume these are ON CLUSTER queries, and that will not cause issues if tables are already defined on a node?

Exactly - you can also check schema.py for the queries.

ee/management/commands/test/test_sync_replicated_schema.py

clickhouse-operator only syncs some tables onto new nodes. This new command ensures that when adding new shards, they are automatically synced up on redeploying Note that there might be timing concerns here as resharding on altinity cloud does not redeploy automatically. In practice however what this means is that new nodes just won't ingest any data until another deploy

macobo requested review from hazzadous, tiina303 and guidoiaquinti March 8, 2022 07:13

macobo force-pushed the sync-schema branch from e1b63bb to eff91f4 Compare March 8, 2022 07:13

guidoiaquinti approved these changes Mar 8, 2022

View reviewed changes

hazzadous suggested changes Mar 8, 2022

View reviewed changes

hazzadous approved these changes Mar 8, 2022

View reviewed changes

ee/management/commands/test/test_sync_replicated_schema.py Outdated Show resolved Hide resolved

macobo force-pushed the sync-schema branch 2 times, most recently from 20665c6 to e79d1f5 Compare March 8, 2022 10:30

macobo force-pushed the fix-quoting branch from b795b19 to 35aba1f Compare March 8, 2022 10:31

Base automatically changed from fix-quoting to master March 8, 2022 10:31

macobo added 3 commits March 8, 2022 12:31

Add test to the new command

864a977

Improve non-replicated test

bf79b69

macobo force-pushed the sync-schema branch from e79d1f5 to bf79b69 Compare March 8, 2022 10:31

macobo merged commit c8d6b22 into master Mar 8, 2022

macobo deleted the sync-schema branch March 8, 2022 10:50

This was referenced Mar 15, 2022

ClickHouse sharding plan #8652

Closed

fix(sharding): Distributed migrations table PostHog/infi.clickhouse_orm#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sharding): add command to sync tables onto new nodes #8912

feat(sharding): add command to sync tables onto new nodes #8912

macobo commented Mar 8, 2022 •

edited

Loading

guidoiaquinti left a comment

macobo commented Mar 8, 2022

hazzadous left a comment

hazzadous Mar 8, 2022

macobo Mar 8, 2022

feat(sharding): add command to sync tables onto new nodes #8912

feat(sharding): add command to sync tables onto new nodes #8912

Conversation

macobo commented Mar 8, 2022 • edited Loading

How did you test this code?

guidoiaquinti left a comment

Choose a reason for hiding this comment

macobo commented Mar 8, 2022

hazzadous left a comment

Choose a reason for hiding this comment

hazzadous Mar 8, 2022

Choose a reason for hiding this comment

macobo Mar 8, 2022

Choose a reason for hiding this comment

macobo commented Mar 8, 2022 •

edited

Loading