-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark][Sharing] Add CDF support for "delta format sharing" #2457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@linzhou-db Can you rebase on top of the master?
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
PR is large but the code is going to be in a separate module which shouldn't make any difference to regular delta-spark users.
sharing/src/main/scala/io/delta/sharing/spark/DeltaSharingCDFUtils.scala
Outdated
Show resolved
Hide resolved
sharing/src/main/scala/io/delta/sharing/spark/DeltaSharingCDFUtils.scala
Outdated
Show resolved
Hide resolved
sharing/src/main/scala/io/delta/sharing/spark/DeltaSharingCDFUtils.scala
Show resolved
Hide resolved
sharing/src/main/scala/io/delta/sharing/spark/DeltaSharingCDFUtils.scala
Show resolved
Hide resolved
sharing/src/main/scala/io/delta/sharing/spark/DeltaSharingLogFileSystem.scala
Outdated
Show resolved
Hide resolved
startingMetadataLineOpt = Some(metadata.deltaMetadata.json + "\n") | ||
} | ||
case protocol: model.DeltaSharingProtocol => | ||
startingProtocolLineOpt = Some(protocol.deltaProtocol.json + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect to get only one protocol action from the server? what if we have multiple protocol actions (due to alter etc) in the log files between starting and end version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same qn for metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, the answer is two fold:
- The server performs basic checks that the recipient/client is able to handle the readerFeatures, and throw error if not.
- The rest of the job is left to the client side delta library, and fallback to the same delta behavior. if delta cannot handle it, it throws the same error as reading a delta table, such as error when reading cdf on a table with column name changed.
Which Delta project/connector is this regarding?
Description
Third PR of #2291: Adds CDF support for "delta format sharing":
How was this patch tested?
Unit Tests
Does this PR introduce any user-facing changes?
No