-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] DynamoDBCommitOwner: add logging, get dynamic confs from sparkSession #3130
[Spark] DynamoDBCommitOwner: add logging, get dynamic confs from sparkSession #3130
Conversation
@@ -655,16 +709,27 @@ private void tryEnsureTableExists() throws IOException { | |||
} | |||
} | |||
if (status.equals("ACTIVE")) { | |||
if (created) { | |||
LOG.info("Successfully created DynamoDB table `{}`", managedCommitsTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add couple of log statements in Commit method as well.
- Attempting to commit <> via DynamoDB
- Commit <> done successfully. Backfilling commit files.
AWSCredentialsProvider awsCredentialsProvider = | ||
(AWSCredentialsProvider) awsCredentialsProviderClass.getConstructor().newInstance(); | ||
ReflectionUtils.createAwsCredentialsProvider(credentialProviderName, hadoopConf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this similar to how DynamoDBLogStore do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this has been directly copied from there.
expectedVersion = 0 | ||
} | ||
assert(tableCommitOwnerClient.getCommits() == GetCommitsResponse(Seq.empty, expectedVersion)) | ||
assert(tableCommitOwnerClient.getCommits() == GetCommitsResponse(Seq.empty, -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add test to validate that the spark config overrides are correctly used by DynamoDBCommitOwnerClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c3bd28b
to
eef4a83
Compare
@dhruvarya-db any plans to add support to Azure (CosmosDB?) commit owner? |
@felipepessoto There is no specific plan right now. Currently the API surface is still evolving. Once it stablizes (probably by Delta 4.0), anyone should be able to implement the API and define their own commit owner. We chose DynamoDB for reference implementation as S3 doesn't have putIfAbsent which causes lost writes issue as explained here. |
…kSession (delta-io#3130) ## Description Updates DynamoDBCommitOwner: - Added logging around table creation flow - Get wcu, rcu, and awsCredentialsProvider from SparkSession - Return -1 as the table version if registerTable has already been called but no actual commits have gone through the owner. This is done by tracking an extra flag in DynamoDB. ## How was this patch tested? Existing tests ## Does this PR introduce _any_ user-facing changes? Yes, introduces new configs (see DeltaSQLConf changes) which can be used to configure the DynamoDBCommitOwner.
Which Delta project/connector is this regarding?
Description
Updates DynamoDBCommitOwner:
How was this patch tested?
Existing tests
Does this PR introduce any user-facing changes?
Yes, introduces new configs (see DeltaSQLConf changes) which can be used to configure the DynamoDBCommitOwner.