You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am enjoying working with the delta format and I believe it is the right table format for my use case.
I have a question about the transactional guarantees with concurrent spark streaming writes in Minio with HMS
The pipeline is like this:
1- Multiple spark streaming jobs write "upsert" to a single delta table stored in Minio "S3 compatible object store"
2- Querying the delta tables using Trino with HMS
“This multi-cluster writing solution is only safe when all writers use this LogStore implementation as well as the same DynamoDB table and region. If some drivers use out-of-the-box Delta Lake while others use this experimental LogStore, then data loss can occur.”
How I can implement the multi-cluster setup in my envairemnt without DynamoDB to have transactional guarantees?
The text was updated successfully, but these errors were encountered:
Hi @KhASQ - there is only the S3+DynamoDB support today but other methods for providing mutual exclusion is a great ask.
This will likely require an additional implementation of LogStore as mentioned on the storage configuration page that integrates with whichever external system is responsible for providing the mutual exclusion.
Could you please share more details about your environment?
Where is this deployed, self hosted/on-prem, cloud provider etc.
What additional services/databases exist? (if not Dynamo), (metastores of any kind etc.)
Would you be open to helping/contributing to the solution?
im also experimenting with delta + minio and recognized that i am in need for dynamodb to cover all szenarios in the wild
I found scylladb with alternator as an replacement for dynamodb
-> https://www.scylladb.com/alternator/
but i dont find a way to specify endpoints in delta configs for this LogStore.
Hi
I am enjoying working with the delta format and I believe it is the right table format for my use case.
I have a question about the transactional guarantees with concurrent spark streaming writes in Minio with HMS
The pipeline is like this:
1- Multiple spark streaming jobs write "upsert" to a single delta table stored in Minio "S3 compatible object store"
2- Querying the delta tables using Trino with HMS
I am worried about the notes in delta docs
https://docs.delta.io/latest/delta-storage.html#-delta-storage-s3
“This multi-cluster writing solution is only safe when all writers use this LogStore implementation as well as the same DynamoDB table and region. If some drivers use out-of-the-box Delta Lake while others use this experimental LogStore, then data loss can occur.”
How I can implement the multi-cluster setup in my envairemnt without DynamoDB to have transactional guarantees?
The text was updated successfully, but these errors were encountered: