forked from delta-io/delta
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
S3 Multi-cluster writes support using DynamoDB
Resolves delta-io#41 This PR addresses issue delta-io#41 - Support for AWS S3 (multiple clusters/drivers/JVMs). It implements few ideas from delta-io#41 discussion: - provides generic base class BaseExternalLogStore for storing listing of commit files in external DB. This class may be easily extended for specific DB backend - stores contents of commit in temporary file and links to it in DB's row to be able to finish uncompleted write operation while reading - provides concrete DynamoDBLogStore implementation extending BaseExternalLogStore - implementations for other DB backends should be simple to implement (ZooKeeper implementation is almost ready, I can create separate PR if anyone is interested) - unit tests in `ExternalLogStoreSuite` which uses `InMemoryLogStore` to mock `DynamoDBLogStore` - python integration test inside of `storage-dynamodb/integration_test/dynamodb_logstore.py` which tests concurrent readers and writers - that integration test can also run using `FailingDynamoDBLogStore` which injects errors into the runtime execution to test error edge cases - This solution has been also stress-tested (by SambaTV) on Amazon's EMR cluster (multiple test jobs writing thousands of parallel transactions to single delta table) and no data loss has beed observed so far To enable DynamoDBLogStore set following spark property: `spark.delta.logStore.class=io.delta.storage.DynamoDBLogStore` Following configuration properties are recognized: io.delta.storage.DynamoDBLogStore.tableName - table name (defaults to 'delta_log') io.delta.storage.DynamoDBLogStore.region - AWS region (defaults to 'us-east-1') Closes delta-io#1044 Co-authored-by: Scott Sandre <scott.sandre@databricks.com> Co-authored-by: Allison Portis <allison.portis@databricks.com> Signed-off-by: Scott Sandre <scott.sandre@databricks.com> GitOrigin-RevId: 7c276f95be92a0ebf1eaa9038d118112d25ebc21
- Loading branch information
Showing
11 changed files
with
1,258 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.