-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial CDC support for DHC #1819
Conversation
jcferretti
commented
Jan 12, 2022
•
edited
Loading
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've basically taken away a stream table as a valid output. You can:
(1) ignore the keys and get a lastBy
(2) just make it append only, so you have a huge table
(3) what about if I know what I'm doing and want the raw stream table instead? Like maybe I don't actually care about these individual things, but I want to aggregate them with counts and sums. Or in an advanced use case maybe I can instead do something like:
purchases = streamTable.where(op = add).view("Dollars").sumBy()
returns = streamTable.where(op = del).view("Dollars").sumBy()
netRevenue = purchases.naturalJoin(returns, "", "ReturnDollars=Dollars").update("NetRevenue=Dollars-ReturnDollars")
Ref #1819 (review) I've reworked some other parts of the change too; there was a bug in that the after__X columns are not present on a delete, so it is important to take the primary key columns always from the Key component (those are present in a delete). Something else I learned from debezium documentation, when an update in the underlying database table modifies a primary key, debezium generates two events, one to delete the old row and one to add the new row. Is very important for our algorithm that they do this, otherwise the trick for
wouldn't work; I thought about this case, went into panic, went to read the debezium docs, and was relieved. |