-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support s3 object store without dynamodb lock #974
Comments
Yes, we recently added an option in the S3 backend called |
@mpetri - just had a quick scan of our code, and you should be able to pass in a custom object store using the DeltaTableBuilder option That said, we should probably look into providing a feature, that allows compiling for single reader scenarios. |
I'm currently blocked in the other bug I reported (can't compile create with s3 support) so I might give this a try thanks. Should I keep this issue open? It seems like a valid request. |
Yes, please keep it open. |
Giving a bump to this FR as I am using an S3-compatible object store (Cloudflare R2) and would like some way to support concurrent writes across processes - currently this is managed via a single process and a Mutex. Perhaps we could replace the locking implementation with a trait, similar to |
Do you care about something that works across S3-compatible APIs? Or just about R2? If you care specifically about R2, I think the more optimal solution is to support it through the object store rather than have some separate locking mechanism. Unlike S3, R2 has support for conditional (Though also note that R2 doesn't work well right now because their multi-part upload doesn't seem to be compatible with S3.) If S3 ever comes out with support with atomic rename_if_not_exist or copy_if_not_exist, then the whole lock client thing will be moot. GCS and Azure Blog store don't need any locking client because they support these operations out-of-the-box. |
I am mostly just interested in R2 - let me check with the R2 team to see if I figured switching to a trait would "plugin" better to the existing locking that uses DynamoDB, but I am fine with either approach. S3-compatible providers all have their own quirks, so that was the most straightforward approach that lets the user deal with those.
Kind of an aside, but can you send me details on the issue you are referencing there? I am on the Slack and would be interested in hearing it to provide the feedback to the R2 team. |
@cmackenzie1 I need confirmation from the R2 team, but the implementation in object-store-rs is based on the one in Arrow C++, and I think there's an issue where they don't support non-equal part sizes: apache/arrow#34363 (comment) |
I followed up with the R2 team, and they confirmed that it is still the case for S3 multipart uploads requiring parts to be the same size (except the last). For the |
closing this, as you can now bypass the requirement (while still being safe) by specifying the respective option on the object store. most recently this was re-added here. |
Description
In the rust crate is it possible to support s3 based delta lakes without the need to pull in and use the dynamodb lock client? I understand the need for the lock client (after reading the paper) but if I know I will ever only have one writer for the deltalake I don't really need to locking mechanism.
Could I somehow achieve this by manually creating an object store somehow (with s3 backend) and passing it to deltalake?
The text was updated successfully, but these errors were encountered: