Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table function for generating Iceberg CDC records #15677

Merged
merged 2 commits into from
Sep 13, 2023

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Jan 11, 2023

Description

Adds a table function which, given a range of snapshot ids, will produce a table of the rows inserted and deleted between those two snapshots.

Currently only supports metadata deletes, not merge-on-read positional or equality deletes.

A document comparing the Iceberg and Delta Lake CDC implementations and how I think we should reconcile the differences is here.

Additional context and related issues

Support for positional and equality deletes depends on: apache/iceberg#6182

Release notes

This is still a draft

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 11, 2023
@alexjo2144 alexjo2144 requested review from findepi and homar January 11, 2023 22:22
@alexjo2144
Copy link
Member Author

Just a rebase to resolve conflicts.

@alexjo2144 alexjo2144 marked this pull request as ready for review June 21, 2023 20:42
@alexjo2144
Copy link
Member Author

Now that the PTF SPI has solidified a bit I've reworked this PR to use it. Should look a lot like the Delta Lake one now

@homar
Copy link
Member

homar commented Jun 22, 2023

I think adding information about this to documentation will be nice

@alexjo2144
Copy link
Member Author

I'll work on some documentation for this tomorrow but I think the code should be pretty close to good.

@findepi
Copy link
Member

findepi commented Aug 10, 2023

Per #17928 (review) using TableFunctionSplitProcessor may be a dead-end and it should be reserved for really cheap functions only.
You may need to work the approach here to use synthesize table handles. Sorry for bringing that news, please don't shoot the messanger.
It's fine by me to have this as a follow-up.
cc @homar @martint @kasiafi

@findepi
Copy link
Member

findepi commented Aug 10, 2023

BTW the build didn't run due to conflicts.

Adds a table function which, given a range of snapshot ids, will
produce a table of the rows inserted and deleted between those two
snapshots.

Currently only supports metadata deletes, not merge-on-read positional
or equality deletes.
@findepi
Copy link
Member

findepi commented Aug 11, 2023

@homar please review

@alexjo2144 please check the build

Copy link
Member

@homar homar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % wouldn't it make sense to already rewrite away form PTF?

@findepi findepi merged commit a0b962c into trinodb:master Sep 13, 2023
@alexjo2144 alexjo2144 deleted the iceberg/cdc branch September 13, 2023 14:20
@github-actions github-actions bot added this to the 427 milestone Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants