Doc changes for logical replication of distributed tables #1085

pinodeca · 2023-04-25T00:29:13Z

Why are we implementing it? (sales eng)

a) Demand from enterprises that heavily rely on CDC in their architecture (event driven apps, process pipelines, auditing, off site replication)

What are the typical use cases?

Enabling writing event-driven applications. The CDC serves as a message bus propagating changes in the database to listening applications allowing them to react and act upon business events (eg. sending out an email notification, triggering different pipelines).

Communication goals (e.g. detailed howto vs orientation)We might use [here ]

Good locations for content in docs structure

(https://docs.citusdata.com/en/v11.2/develop/api_guc.html)to explain the GUC citus.citus.enable_change_data_capture

How does this work? (devs)

Change Data Capture(CDC) for Citus is implemented using Logical Replication to publish events from any table in a Citus cluster. For distributed tables, any events caused by shard management operations like shard splits, moves, creation of distributed table, undsitribute tables, are not re-published to CDC clients. This is achieved by setting up replication origin session, which will add replication origin field to every WAL entry for such events. A decoder plugin used for decoding the WAL entries and publish the events to CDC client. This decoder plugin will ignore any entry with the replication orgin field set and also translate the shard names of distributed table to the distributed table name so that the CDC clients need not be aware of the shard names of distributed tables.

Example sql

Create publication for distributed table:

create publication cdc_publication for table
Create logical replication slot:
select * from pg_create_logical_replication_slot('cdc_replication_slot', 'pgoutput', false);
Create subscriber for logical replication:
create subscription connection 'dbname= host= user= port=' publication WITH (copy_data=true,create_slot=false,slot_name='');

Corner cases, gotchas

Citus uses Postgres' logical replication primitives, so inherits all Postgres logical replication limitations such as https://www.postgresql.org/docs/current/logical-replication-restrictions.html
CDC provides guarantee the ordering of the events within the same shard (or more general within the same worker) but does not provide any guarantee on the ordering of the events across shards/nodes.
Does not work with Columnar tables (https://github.com/citusdata/citus/tree/main/src/backend/columnar#limitations) see no support for logical replication
If a table has already data on it, adding it to a publication might suffer from having a consistent snapshot of the same table on the target. That is because multiple create subscription with copy_data true on one of them suffers from a snapshot isolation issue, because updates will start replaying before the data copy is done.

Are there relevant blog posts or outside documentation about the concept/feature?

No

Link to relevant commits and regression tests if applicable

CDC PRs:
citusdata/citus#6623
citusdata/citus#6810
citusdata/citus#6827

pinodeca · 2023-04-25T00:29:36Z

@rajeshkt78 @onderkalaci Can you please fill in each of the template sections in the description above?

jonels-msft mentioned this issue May 11, 2023

Logical replication #1092

Merged

jonels-msft closed this as completed in #1092 May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc changes for logical replication of distributed tables #1085

Doc changes for logical replication of distributed tables #1085

pinodeca commented Apr 25, 2023 •

edited by rajeshkt78

Loading

pinodeca commented Apr 25, 2023

Doc changes for logical replication of distributed tables #1085

Doc changes for logical replication of distributed tables #1085

Comments

pinodeca commented Apr 25, 2023 • edited by rajeshkt78 Loading

Why are we implementing it? (sales eng)

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)We might use [here ]

Good locations for content in docs structure

How does this work? (devs)

Example sql

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

pinodeca commented Apr 25, 2023

pinodeca commented Apr 25, 2023 •

edited by rajeshkt78

Loading