Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc changes for logical replication of distributed tables #1085

Closed
pinodeca opened this issue Apr 25, 2023 · 1 comment · Fixed by #1092
Closed

Doc changes for logical replication of distributed tables #1085

pinodeca opened this issue Apr 25, 2023 · 1 comment · Fixed by #1092

Comments

@pinodeca
Copy link

pinodeca commented Apr 25, 2023

Why are we implementing it? (sales eng)

a) Demand from enterprises that heavily rely on CDC in their architecture (event driven apps, process pipelines, auditing, off site replication)

What are the typical use cases?

Enabling writing event-driven applications. The CDC serves as a message bus propagating changes in the database to listening applications allowing them to react and act upon business events (eg. sending out an email notification, triggering different pipelines).

Communication goals (e.g. detailed howto vs orientation)We might use [here ]

Good locations for content in docs structure

(https://docs.citusdata.com/en/v11.2/develop/api_guc.html)to explain the GUC citus.citus.enable_change_data_capture

How does this work? (devs)

Change Data Capture(CDC) for Citus is implemented using Logical Replication to publish events from any table in a Citus cluster. For distributed tables, any events caused by shard management operations like shard splits, moves, creation of distributed table, undsitribute tables, are not re-published to CDC clients. This is achieved by setting up replication origin session, which will add replication origin field to every WAL entry for such events. A decoder plugin used for decoding the WAL entries and publish the events to CDC client. This decoder plugin will ignore any entry with the replication orgin field set and also translate the shard names of distributed table to the distributed table name so that the CDC clients need not be aware of the shard names of distributed tables.

Example sql

Create publication for distributed table:

create publication cdc_publication for table
Create logical replication slot:
select * from pg_create_logical_replication_slot('cdc_replication_slot', 'pgoutput', false);
Create subscriber for logical replication:
create subscription connection 'dbname= host= user= port=' publication WITH (copy_data=true,create_slot=false,slot_name='');

Corner cases, gotchas

  • Citus uses Postgres' logical replication primitives, so inherits all Postgres logical replication limitations such as https://www.postgresql.org/docs/current/logical-replication-restrictions.html
  • CDC provides guarantee the ordering of the events within the same shard (or more general within the same worker) but does not provide any guarantee on the ordering of the events across shards/nodes.
  • Does not work with Columnar tables (https://github.com/citusdata/citus/tree/main/src/backend/columnar#limitations) see no support for logical replication
  • If a table has already data on it, adding it to a publication might suffer from having a consistent snapshot of the same table on the target. That is because multiple create subscription with copy_data true on one of them suffers from a snapshot isolation issue, because updates will start replaying before the data copy is done.

Are there relevant blog posts or outside documentation about the concept/feature?

No

Link to relevant commits and regression tests if applicable

CDC PRs:
citusdata/citus#6623
citusdata/citus#6810
citusdata/citus#6827

@pinodeca
Copy link
Author

@rajeshkt78 @onderkalaci Can you please fill in each of the template sections in the description above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant