Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve write throughput through a finer-grained lock. #1716

Closed
syedashrafulla opened this issue Oct 10, 2023 · 2 comments
Closed

Improve write throughput through a finer-grained lock. #1716

syedashrafulla opened this issue Oct 10, 2023 · 2 comments
Assignees
Labels
binding/rust Issues for the Rust crate enhancement New feature or request storage/aws AWS S3 storage related

Comments

@syedashrafulla
Copy link

Description

delta-rs uses a Rust-based AWS DynamoDB lock client that has a partition key but no sort key. As a result, the lock forces a commit happening shortly after an in-progress commit to sleep for a refresh period before retrying.

delta-standalone uses its own Java-based AWS DynamoDB lock that has the same partition key but the commit name as the sort key, with the fileName set to the commit here. As a result, the commit happening shortly after an in-progress commit can immediately reserve the next commit and move on with committing the next commit. delta-standalone ensures commits are in-order by checking the delta log for the previous commit's existence in the log.

The consequence of adding the commit file name as a sort key is that concurrent writers do not have to wait for the refresh_period to determine their next commit. The commit happening shortly after reads the log and then locks its ownership of the next commit. In return for the cost of adding the commit file name as a sort key, the commits themselves are scheduled more frequently than the refresh period.

I think this is a worthwhile tradeoff and a feature to introduce to delta-rs.

Use Case

Concurrent writes from multiple writers using delta-rs.

Related Issue(s)

#1674 for documenting how the lock works
#1333 for a separation-of-concerns refactor of the lock
#1251 for documentation that the current lock is incompatible with the Apache Spark or Databricks locks, which I believe is less necessary when the commit-file-name-as-sort-key is introduced.

@syedashrafulla syedashrafulla added the enhancement New feature or request label Oct 10, 2023
@rtyler rtyler added binding/rust Issues for the Rust crate storage/aws AWS S3 storage related labels Oct 10, 2023
@rtyler
Copy link
Member

rtyler commented Dec 22, 2023

@dispanser has been doing some work to move us to a consistent implementation with the Delta/Spark library's S3DynamoDbLogStore implementation.

I will close this issue once we remove the older dynamodb lock from delta-rs.

@rtyler rtyler self-assigned this Dec 22, 2023
@rtyler
Copy link
Member

rtyler commented Feb 6, 2024

0.17 has been released which removes the older lock and moves us to the S3DynamoDbLogStore, see #2173 for more details in the changelog

@rtyler rtyler closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate enhancement New feature or request storage/aws AWS S3 storage related
Projects
None yet
Development

No branches or pull requests

2 participants