You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a) Demand from enterprises that heavily rely on CDC in their architecture (event driven apps, process pipelines, auditing, off site replication)
What are the typical use cases?
Enabling writing event-driven applications. The CDC serves as a message bus propagating changes in the database to listening applications allowing them to react and act upon business events (eg. sending out an email notification, triggering different pipelines).
Communication goals (e.g. detailed howto vs orientation)We might use [here ]
Change Data Capture(CDC) for Citus is implemented using Logical Replication to publish events from any table in a Citus cluster. For distributed tables, any events caused by shard management operations like shard splits, moves, creation of distributed table, undsitribute tables, are not re-published to CDC clients. This is achieved by setting up replication origin session, which will add replication origin field to every WAL entry for such events. A decoder plugin used for decoding the WAL entries and publish the events to CDC client. This decoder plugin will ignore any entry with the replication orgin field set and also translate the shard names of distributed table to the distributed table name so that the CDC clients need not be aware of the shard names of distributed tables.
Example sql
Create publication for distributed table:
create publication cdc_publication for table
Create logical replication slot:
select * from pg_create_logical_replication_slot('cdc_replication_slot', 'pgoutput', false);
Create subscriber for logical replication:
create subscription connection 'dbname= host= user= port=' publication WITH (copy_data=true,create_slot=false,slot_name='');
CDC provides guarantee the ordering of the events within the same shard (or more general within the same worker) but does not provide any guarantee on the ordering of the events across shards/nodes.
If a table has already data on it, adding it to a publication might suffer from having a consistent snapshot of the same table on the target. That is because multiple create subscription with copy_data true on one of them suffers from a snapshot isolation issue, because updates will start replaying before the data copy is done.
Are there relevant blog posts or outside documentation about the concept/feature?
No
Link to relevant commits and regression tests if applicable
Why are we implementing it? (sales eng)
a) Demand from enterprises that heavily rely on CDC in their architecture (event driven apps, process pipelines, auditing, off site replication)
What are the typical use cases?
Enabling writing event-driven applications. The CDC serves as a message bus propagating changes in the database to listening applications allowing them to react and act upon business events (eg. sending out an email notification, triggering different pipelines).
Communication goals (e.g. detailed howto vs orientation)We might use [here ]
Good locations for content in docs structure
(https://docs.citusdata.com/en/v11.2/develop/api_guc.html)to explain the GUC citus.citus.enable_change_data_capture
How does this work? (devs)
Change Data Capture(CDC) for Citus is implemented using Logical Replication to publish events from any table in a Citus cluster. For distributed tables, any events caused by shard management operations like shard splits, moves, creation of distributed table, undsitribute tables, are not re-published to CDC clients. This is achieved by setting up replication origin session, which will add replication origin field to every WAL entry for such events. A decoder plugin used for decoding the WAL entries and publish the events to CDC client. This decoder plugin will ignore any entry with the replication orgin field set and also translate the shard names of distributed table to the distributed table name so that the CDC clients need not be aware of the shard names of distributed tables.
Example sql
Create publication for distributed table:
Corner cases, gotchas
Are there relevant blog posts or outside documentation about the concept/feature?
No
Link to relevant commits and regression tests if applicable
CDC PRs:
citusdata/citus#6623
citusdata/citus#6810
citusdata/citus#6827
The text was updated successfully, but these errors were encountered: