Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark][Sharing] Add CDF support for "delta format sharing" #2457

Closed
wants to merge 5 commits into from

Conversation

linzhou-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (delta sharing)

Description

Third PR of #2291: Adds CDF support for "delta format sharing":

  • DeltaSharingDataSource with cdf query support
  • DeltaSharingCDFUtils/DeltaSharingCDFUtilsSuite

How was this patch tested?

Unit Tests

Does this PR introduce any user-facing changes?

No

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linzhou-db Can you rebase on top of the master?

@linzhou-db
Copy link
Collaborator Author

@linzhou-db Can you rebase on top of the master?

done

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

PR is large but the code is going to be in a separate module which shouldn't make any difference to regular delta-spark users.

startingMetadataLineOpt = Some(metadata.deltaMetadata.json + "\n")
}
case protocol: model.DeltaSharingProtocol =>
startingProtocolLineOpt = Some(protocol.deltaProtocol.json + "\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect to get only one protocol action from the server? what if we have multiple protocol actions (due to alter etc) in the log files between starting and end version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same qn for metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, the answer is two fold:

  1. The server performs basic checks that the recipient/client is able to handle the readerFeatures, and throw error if not.
  2. The rest of the job is left to the client side delta library, and fallback to the same delta behavior. if delta cannot handle it, it throws the same error as reading a delta table, such as error when reading cdf on a table with column name changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants