This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
[Request for Comment] Should destinations tombstone or remove CDC deletions? #31239
Labels
team/destinations
Destinations team's backlog
Hello Airbyte Community!
Now that we've launched Destinations V2 (#26028), we are able to more easily make changes to how data is represented in your data warehouse's final tables based on your feedback. The first area of investigation is how we represent deletions from CDC sources (e.g. from source-postgres).
Today, if a row is deleted in your source, we remove it in the Destination's final table as part of the deduplication process - the row will not be present in the final table. However, an alternative exists in which we leave the row in the final table, but have a "Tombstone" or "Soft Delete" column present that is either null (the row exists in the source) or non-null (the row has been deleted from the source), e.g.:
In the example above, "Evan" (user 1) exists in the source, and "Edward" (user 2) has been deleted. In most cases (depending on the CDC source), the tombstone column will be
_ab_cdc_deleted_at
, a timestamp. You would gain the additional information about when the row was deleted as well. Should you want a view of your data in the destination which more closely resembles the source (e.g. current behavior), you can filter outWHERE _ab_cdc_deleted_at IS NULL
and either make a new table or view for your downstream analysis.Of note, we currently only remove deleted rows for CDC database sources. There are many API source which also provide a deletion/tombstone column (e.g. source-salesforce) whose records will remain in your final table. Switching CDC deletes to soft-deletes would homogenize how Airbyte works for all sources. It also has the benefit of speeding up the Typing and Deduping process.
So... which do you prefer? Respond with a 👍 to change soft-deleting (leaving the row in the final tables + tombstone column) or a 👎 to keep the existing behavior and remove deleted rows from the final table.
The text was updated successfully, but these errors were encountered: