Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CDC support for DHC #1819

Merged
merged 22 commits into from
Jan 13, 2022
Merged

Conversation

jcferretti
Copy link
Member

@jcferretti jcferretti commented Jan 12, 2022

Deephaven - Google Chrome_140

docker-compose-common.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@cpwright cpwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've basically taken away a stream table as a valid output. You can:
(1) ignore the keys and get a lastBy
(2) just make it append only, so you have a huge table
(3) what about if I know what I'm doing and want the raw stream table instead? Like maybe I don't actually care about these individual things, but I want to aggregate them with counts and sums. Or in an advanced use case maybe I can instead do something like:
purchases = streamTable.where(op = add).view("Dollars").sumBy()
returns = streamTable.where(op = del).view("Dollars").sumBy()
netRevenue = purchases.naturalJoin(returns, "", "ReturnDollars=Dollars").update("NetRevenue=Dollars-ReturnDollars")

debezium-demo/loadgen/Dockerfile Outdated Show resolved Hide resolved
Integrations/python/deephaven/ConsumeCdc.py Outdated Show resolved Hide resolved
Integrations/python/deephaven/ConsumeCdc.py Outdated Show resolved Hide resolved
@jcferretti
Copy link
Member Author

Ref #1819 (review)
(I can't answer directly under that comment, sadly, very poor github UI design).
Makes sense and it becomes obvious when looking at the implementation code; just returning the streaming table before wrapping it in a call to streamingToAppendTable is obviously more general and the user themselves can do the wrapping if they so desire.

I've reworked some other parts of the change too; there was a bug in that the after__X columns are not present on a delete, so it is important to take the primary key columns always from the Key component (those are present in a delete).

Something else I learned from debezium documentation, when an update in the underlying database table modifies a primary key, debezium generates two events, one to delete the old row and one to add the new row. Is very important for our algorithm that they do this, otherwise the trick for

.where("op = `d`")

wouldn't work; I thought about this case, went into panic, went to read the debezium docs, and was relieved.

@jcferretti jcferretti marked this pull request as ready for review January 13, 2022 05:04
cpwright
cpwright previously approved these changes Jan 13, 2022
cpwright
cpwright previously approved these changes Jan 13, 2022
@jcferretti jcferretti merged commit 647d4b8 into deephaven:main Jan 13, 2022
@jcferretti jcferretti deleted the cfs-debezium-demo-0 branch January 13, 2022 22:04
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants