Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CLICKHOUSE_REPLICATION work correctly #8730

Merged
merged 18 commits into from
Mar 1, 2022
Merged

Conversation

macobo
Copy link
Contributor

@macobo macobo commented Feb 23, 2022

Changes

This PR finally makes CLICKHOUSE_REPLICATION set up the right sharded and replicated schema on fresh installs. It also tries to replicate posting cloud as closely as possible.

See #8652 for the full ticket and details of next steps.

If you don't know what writable_events and Distributed table engines do - read up on https://posthog.com/handbook/engineering/event-ingestion

Note that there are cases where the schema here is correct but might be incorrect on cloud: see #7571

How did you test this code?

See unit tests.

Also set up a local instance with CLICKHOUSE_REPLICATION and tested it working.

@macobo macobo marked this pull request as ready for review February 23, 2022 12:16
@macobo macobo changed the title WIP: Make CLICKHOUSE_REPLICATION work correctly Make CLICKHOUSE_REPLICATION work correctly Feb 23, 2022
@macobo macobo mentioned this pull request Feb 23, 2022
Copy link
Contributor

@yakkomajuri yakkomajuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment so far as we're pretty busy at the offsite - also this is not a small pr nor a light one so will take more time

migrations.RunSQL(f"DROP TABLE kafka_person_distinct_id ON CLUSTER {CLICKHOUSE_CLUSTER}"),
migrations.RunSQL(
f"""
INSERT INTO {TEMPORARY_TABLE_NAME} (distinct_id, person_id, team_id, _sign, _timestamp, _offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given we're re-writing this migration, could we use ATTACH PARTITION instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't rewrite this - just skipping it in some fresh installs where it's not needed :)

Copy link
Contributor

@hazzadous hazzadous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just about to head out but wanted to get give you something at least.

Is the intention to converge on a single schema eventually? I haven't thought through what the migration path would need to look like here, but having the bifurcation in there I guess has ramifications on e.g. tests. I didn't actually look at that, do we enable CLICKHOUSE_REPLICATION in tests?

conftest.py Show resolved Hide resolved
ee/clickhouse/migrations/0004_kafka.py Show resolved Hide resolved
ee/clickhouse/sql/session_recording_events.py Outdated Show resolved Hide resolved
ee/clickhouse/sql/table_engines.py Show resolved Hide resolved
ee/clickhouse/sql/session_recording_events.py Show resolved Hide resolved
@macobo
Copy link
Contributor Author

macobo commented Feb 24, 2022

Is the intention to converge on a single schema eventually? I haven't thought through what the migration path would need to look like here, but having the bifurcation in there I guess has ramifications on e.g. tests. I didn't actually look at that, do we enable CLICKHOUSE_REPLICATION in tests?

Yes we want to converge - though this will be handled separately from this ticket. See the master issue for full details: #8652

We can't really get rid of the "old" path until other steps in there are solved.

@macobo macobo force-pushed the refactor-schema-replicated branch from db04ceb to 24e8342 Compare February 25, 2022 06:33
@macobo macobo removed the request for review from guidoiaquinti March 1, 2022 08:04
@macobo macobo requested review from hazzadous and yakkomajuri March 1, 2022 08:04
Copy link
Contributor

@yakkomajuri yakkomajuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't tested but looks good to me overall.

a piece of feedback (particularly following the eng offsite) is that this PR could have been broken up to make it easier to review. The classes for engines and the lambdas for sql could have come before the logic changes

engine=table_engine(EVENTS_TABLE, "_timestamp", REPLACING_MERGE_TREE),
table_name=EVENTS_DATA_TABLE(),
cluster=settings.CLICKHOUSE_CLUSTER,
engine=ReplacingMergeTree("events", ver="_timestamp", replication_scheme=ReplicationScheme.SHARDED),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, non blocker: this was confusing to me at first - I suspected that the class would account for ClickHouse replication - and it does - but it is a bit misleading that I pass in ReplicationScheme.SHARDED but it might actually be ReplicationScheme.NOT_SHARDED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions on how to clear this up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the env check here instead of down in the class? Does move a bit of clutter elsewhere though at the benefit of being more explicit.

I wonder if there's also a more explicit name for the argument that would help clarify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to do so in a follow-up as my subsequent commits would otherwise conflict. Thanks for the ideas! :)

@macobo
Copy link
Contributor Author

macobo commented Mar 1, 2022

a piece of feedback (particularly following the eng offsite) is that this PR could have been broken up to make it easier to review. The classes for engines and the lambdas for sql could have come before the logic changes

Given I'm alone on this project, this PR has been waiting for a review for a while (due to the offsite) and this is the first (and simple) part of the to-sharded migration I don't think that would have been feasible here without sacrificing being able to get things done in the first place.

Copy link
Contributor

@hazzadous hazzadous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although there's lots going on

Some non-blocking points:

  1. this is changing things across 27 files, which seems like a lot. I'm not sure if that's because there is a lot changing in the PR or if we should re factor the code somehow. Unsure
  2. It's really difficult to visualise what's happening with all the tables/shards/replication. I don't know what to do there, but I guess all the docs you're adding will help 🤞
  3. It makes me scared that we don't have the tests run in CI with CLICKHOUSE_REPLICATION set. I guess that won't be an issue once we're unified the code paths

@macobo macobo merged commit 401e6f5 into master Mar 1, 2022
@macobo macobo deleted the refactor-schema-replicated branch March 1, 2022 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants