Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] DynamoDBCommitOwner: add logging, get dynamic confs from sparkSession #3130

Merged

Conversation

dhruvarya-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Updates DynamoDBCommitOwner:

  • Added logging around table creation flow
  • Get wcu, rcu, and awsCredentialsProvider from SparkSession
  • Return -1 as the table version if registerTable has already been called but no actual commits have gone through the owner. This is done by tracking an extra flag in DynamoDB.

How was this patch tested?

Existing tests

Does this PR introduce any user-facing changes?

Yes, introduces new configs (see DeltaSQLConf changes) which can be used to configure the DynamoDBCommitOwner.

@@ -655,16 +709,27 @@ private void tryEnsureTableExists() throws IOException {
}
}
if (status.equals("ACTIVE")) {
if (created) {
LOG.info("Successfully created DynamoDB table `{}`", managedCommitsTableName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add couple of log statements in Commit method as well.

  • Attempting to commit <> via DynamoDB
  • Commit <> done successfully. Backfilling commit files.

AWSCredentialsProvider awsCredentialsProvider =
(AWSCredentialsProvider) awsCredentialsProviderClass.getConstructor().newInstance();
ReflectionUtils.createAwsCredentialsProvider(credentialProviderName, hadoopConf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this similar to how DynamoDBLogStore do it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this has been directly copied from there.

expectedVersion = 0
}
assert(tableCommitOwnerClient.getCommits() == GetCommitsResponse(Seq.empty, expectedVersion))
assert(tableCommitOwnerClient.getCommits() == GetCommitsResponse(Seq.empty, -1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add test to validate that the spark config overrides are correctly used by DynamoDBCommitOwnerClient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvarya-db dhruvarya-db force-pushed the dynamodb-commitowner-sparksession branch from c3bd28b to eef4a83 Compare May 22, 2024 02:10
@felipepessoto
Copy link
Contributor

@dhruvarya-db any plans to add support to Azure (CosmosDB?) commit owner?

@prakharjain09
Copy link
Collaborator

@dhruvarya-db any plans to add support to Azure (CosmosDB?) commit owner?

@felipepessoto There is no specific plan right now. Currently the API surface is still evolving. Once it stablizes (probably by Delta 4.0), anyone should be able to implement the API and define their own commit owner.

We chose DynamoDB for reference implementation as S3 doesn't have putIfAbsent which causes lost writes issue as explained here.

@vkorukanti vkorukanti merged commit d5e9a26 into delta-io:master May 22, 2024
9 of 10 checks passed
longvu-db pushed a commit to longvu-db/delta that referenced this pull request May 28, 2024
…kSession (delta-io#3130)

## Description
Updates DynamoDBCommitOwner:

 - Added logging around table creation flow
 - Get wcu, rcu, and awsCredentialsProvider from SparkSession
- Return -1 as the table version if registerTable has already been
called but no actual commits have gone through the owner. This is done
by tracking an extra flag in DynamoDB.

## How was this patch tested?
Existing tests

## Does this PR introduce _any_ user-facing changes?
Yes, introduces new configs (see DeltaSQLConf changes) which can be used
to configure the DynamoDBCommitOwner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants