Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writes to Delta tables concurrently with Spark #14953

Open
tvamsikalyan opened this issue Nov 8, 2022 · 6 comments
Open

Support writes to Delta tables concurrently with Spark #14953

tvamsikalyan opened this issue Nov 8, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@tvamsikalyan
Copy link

Delta lake supports multi cluster setup using Spark that supports concurrent writes to S3 from multiple clusters . Please see multi cluster setup for more information.

This ticket is to request similar feature from Trino.

cc: @dennyglee

@tvamsikalyan tvamsikalyan changed the title [Feature Request] Support for concurrent writes to S3 from multiple clusters [Feature Request] Delta lake connector support for concurrent writes to S3 from multiple clusters Nov 8, 2022
@dennyglee
Copy link

thanks @tvamsikalyan - I'll be glad to work with folks on enabling this!

@ebyhr ebyhr changed the title [Feature Request] Delta lake connector support for concurrent writes to S3 from multiple clusters Delta lake connector support for concurrent writes to S3 from multiple clusters Nov 10, 2022
@ebyhr ebyhr added the enhancement New feature or request label Nov 10, 2022
@tdas
Copy link

tdas commented Nov 10, 2022

I am happy to help out as well. It would be great if Trino connector starts using Delta Standalone for reading the Delta Log. Then the DynamoDB multi-cluster write solution will be available for free, as it works with any connector using Delta Standalone.

@alexjo2144
Copy link
Member

Switching the connector to use the DSR would be a pretty big overhaul. Is the protocol for how the engine should interact with DynamoDB to check/set table locks documented somewhere? If it's not too complicated, implementing a TransactionLogSynchronizer would probably be easier.

@alexjo2144
Copy link
Member

Having lock compatibility with the open source Delta Spark implementation would be really great to have though.

@dennyglee
Copy link

Here's the design document for the S3DynamoDBLogstore and related issues 41, 1044. Hope this helps and please do not hesitate to ping me or @tdas for this, eh?!

@findepi findepi changed the title Delta lake connector support for concurrent writes to S3 from multiple clusters Support writes to Delta tables concurrently with Spark Nov 16, 2022
@soumilshah1995
Copy link

Any updates ?delta-io/delta#1498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants