Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1105] Change Data Feed - MERGE command #1155
[1105] Change Data Feed - MERGE command #1155
Changes from all commits
a709567
eed7556
d775507
5b980a6
0ba5787
497abc9
92bf666
cac8bfd
a6ecfb5
874569a
02a238e
b15fdad
a1d4750
cb92d53
9bd5c87
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this last time. Definitely add param docs. the triple sequence is hella confusing. honestly i should have param docs when i had originally implemented this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added param docs. An overview of what
JoinedRowProcessor
is doing may also be helpful, what do you think? I can add tomorrowThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually need this. it can be done the way it is in https://github.com/allisonport-db/delta/blob/02a238e6666e31cc74ea1dbda12842ce929de4d6/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L860 such that we simply do not create an output row. Not sure which is clearer to readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont get what you are referring to here. thread got lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry might be hard to explain in writing.
But basically since now
processRow
returns anIterator[InternalRow]
instead of justInternalRow
, instead of using an expression to create our "deletedRowOutput" that we later delete, we could simply omit thatinputRow
from the returned iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's implemented that way in the above linked commit, before I added back
ROW_DROPPED_COL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more a question of readability I think... not sure if either way is preferred to the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, this is simply mimicking the prior implementation when CDC is disabled.
Another solution is to have
outputRowEncoder
includeROW_DROPPED_COL
when CDC is disabled. It will be dropped on line 684 regardless. Not sure the tradeoff with respect to decoding a column we don't need.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is moot discussion now right? you have to used ROW_DROPPED_COL to get the metrics right .. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just about how we get the index of
ROW_DROPPED_COL
. This fx could be simplified toif we always include
ROW_DROPPED_COL
inoutputRowEncoder