Improve write throughput through a finer-grained lock. #1716
Labels
binding/rust
Issues for the Rust crate
enhancement
New feature or request
storage/aws
AWS S3 storage related
Description
delta-rs uses a Rust-based AWS DynamoDB lock client that has a partition key but no sort key. As a result, the lock forces a commit happening shortly after an in-progress commit to sleep for a refresh period before retrying.
delta-standalone uses its own Java-based AWS DynamoDB lock that has the same partition key but the commit name as the sort key, with the
fileName
set to the commit here. As a result, the commit happening shortly after an in-progress commit can immediately reserve the next commit and move on with committing the next commit. delta-standalone ensures commits are in-order by checking the delta log for the previous commit's existence in the log.The consequence of adding the commit file name as a sort key is that concurrent writers do not have to wait for the
refresh_period
to determine their next commit. The commit happening shortly after reads the log and then locks its ownership of the next commit. In return for the cost of adding the commit file name as a sort key, the commits themselves are scheduled more frequently than the refresh period.I think this is a worthwhile tradeoff and a feature to introduce to delta-rs.
Use Case
Concurrent writes from multiple writers using delta-rs.
Related Issue(s)
#1674 for documenting how the lock works
#1333 for a separation-of-concerns refactor of the lock
#1251 for documentation that the current lock is incompatible with the Apache Spark or Databricks locks, which I believe is less necessary when the commit-file-name-as-sort-key is introduced.
The text was updated successfully, but these errors were encountered: