-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CLICKHOUSE_REPLICATION work correctly #8730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment so far as we're pretty busy at the offsite - also this is not a small pr nor a light one so will take more time
migrations.RunSQL(f"DROP TABLE kafka_person_distinct_id ON CLUSTER {CLICKHOUSE_CLUSTER}"), | ||
migrations.RunSQL( | ||
f""" | ||
INSERT INTO {TEMPORARY_TABLE_NAME} (distinct_id, person_id, team_id, _sign, _timestamp, _offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given we're re-writing this migration, could we use ATTACH PARTITION
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't rewrite this - just skipping it in some fresh installs where it's not needed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just about to head out but wanted to get give you something at least.
Is the intention to converge on a single schema eventually? I haven't thought through what the migration path would need to look like here, but having the bifurcation in there I guess has ramifications on e.g. tests. I didn't actually look at that, do we enable CLICKHOUSE_REPLICATION
in tests?
Yes we want to converge - though this will be handled separately from this ticket. See the master issue for full details: #8652 We can't really get rid of the "old" path until other steps in there are solved. |
This brings us closer to having production and local schemas in sync
db04ceb
to
24e8342
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't tested but looks good to me overall.
a piece of feedback (particularly following the eng offsite) is that this PR could have been broken up to make it easier to review. The classes for engines and the lambdas for sql could have come before the logic changes
engine=table_engine(EVENTS_TABLE, "_timestamp", REPLACING_MERGE_TREE), | ||
table_name=EVENTS_DATA_TABLE(), | ||
cluster=settings.CLICKHOUSE_CLUSTER, | ||
engine=ReplacingMergeTree("events", ver="_timestamp", replication_scheme=ReplicationScheme.SHARDED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, non blocker: this was confusing to me at first - I suspected that the class would account for ClickHouse replication - and it does - but it is a bit misleading that I pass in ReplicationScheme.SHARDED
but it might actually be ReplicationScheme.NOT_SHARDED
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions on how to clear this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the env check here instead of down in the class? Does move a bit of clutter elsewhere though at the benefit of being more explicit.
I wonder if there's also a more explicit name for the argument that would help clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to do so in a follow-up as my subsequent commits would otherwise conflict. Thanks for the ideas! :)
Given I'm alone on this project, this PR has been waiting for a review for a while (due to the offsite) and this is the first (and simple) part of the to-sharded migration I don't think that would have been feasible here without sacrificing being able to get things done in the first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good although there's lots going on
Some non-blocking points:
- this is changing things across 27 files, which seems like a lot. I'm not sure if that's because there is a lot changing in the PR or if we should re factor the code somehow. Unsure
- It's really difficult to visualise what's happening with all the tables/shards/replication. I don't know what to do there, but I guess all the docs you're adding will help 🤞
- It makes me scared that we don't have the tests run in CI with CLICKHOUSE_REPLICATION set. I guess that won't be an issue once we're unified the code paths
Changes
This PR finally makes
CLICKHOUSE_REPLICATION
set up the right sharded and replicated schema on fresh installs. It also tries to replicate posting cloud as closely as possible.See #8652 for the full ticket and details of next steps.
If you don't know what writable_events and Distributed table engines do - read up on https://posthog.com/handbook/engineering/event-ingestion
Note that there are cases where the schema here is correct but might be incorrect on cloud: see #7571
How did you test this code?
See unit tests.
Also set up a local instance with
CLICKHOUSE_REPLICATION
and tested it working.