-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sharding): add command to sync tables onto new nodes #8912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and few non-blocking comments:
- should we add a specific test for scenarios in
sync_replicated_schema
? - should we handle as well the case of ghost tables? (e.g. tables that were dropped on some nodes but not on all of them)
Automatically dropping client data is a huge no-no in my book so not something I want to touch with a 20-foot pole. This is not an issue right now IMO.
Not really given our current CI setup or without mocking the DB. The mocking route wouldn't really give us anything either. Given how light-weight this script is, I don't feel too uncomfortable with this as-is, but do argue if you disagree! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a test here given it's critical for production deployments that this runs as expected.
The other comments are non-blocking
|
||
logger.info("Creating missing tables") | ||
for query in CREATE_TABLE_QUERIES: | ||
sync_execute(build_query(query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are ON CLUSTER queries, and that will not cause issues if tables are already defined on a node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly - you can also check schema.py
for the queries.
20665c6
to
e79d1f5
Compare
clickhouse-operator only syncs some tables onto new nodes. This new command ensures that when adding new shards, they are automatically synced up on redeploying Note that there might be timing concerns here as resharding on altinity cloud does not redeploy automatically. In practice however what this means is that new nodes just won't ingest any data until another deploy
clickhouse-operator only syncs some tables onto new nodes. This new
command ensures that when adding new shards, they are automatically
synced up on redeploying
Note that there might be timing concerns here as resharding on altinity
cloud does not redeploy automatically. In practice however what this
means is that new nodes just won't ingest any data until another deploy
Closes #8904, related to #8652
How did you test this code?
Custom-built an image and deployed it in a sharded setting.