You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this proposal, I want to share my vision of the Databend Data Access Layer. It's not the official announcement or commitment, and the vision could be changed during development. Any comments or advice is welcome!
A growing number of storage services are providing support for the rust language (s3, azure, gcs...) and more and more cloud-native applications are being developed based on Rust (databend, datenlord, engula...).
However, in order to add different storage service support, we have to implement service support one by one, making the complexity into O(m*n). If we can build a lib that provides a general, orthogonal API, we can reduce the global complexity to O(m+n).
For Databend
The current design on DAL is highly affected by parquet. We have to support AsyncSeek on storage services that don't have native seek support, which makes the API hard to both use and implements. Current DAL hides too many details that the caller doesn't know whether this call will send a new HTTP request.
Proposal
So I propose to build a new DAL lib that:
General: designed for any workload, not only for Databend.
Zero-Overhead: Using this lib is just like using the native SDK.
Easy to understand: Both for using and implementing.
In the future, the DAL lib will be extracted from Databend and adopted by different projects.
In this lib, we will provide the following features:
Unified initialization logic: connection string or pre-defined config struct.
Storage features like read, write, stat, delete, list.
Ability to detect storage side features
Multiple storage services support
Deterministic behavior
Behavioral Testing
Next Steps
The first step of my vision is to refactor the DAL design in Databend codebase. I will validate my ideas in a real project.
Once I think my idea is mature enough, I will create a tracking issue to let the community catch up on the progress. Stay tuned, and thanks for reading!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In this proposal, I want to share my vision of the Databend Data Access Layer. It's not the official announcement or commitment, and the vision could be changed during development. Any comments or advice is welcome!
Tracking issues: #3677
Background
For Rust
A growing number of storage services are providing support for the rust language (s3, azure, gcs...) and more and more cloud-native applications are being developed based on Rust (databend, datenlord, engula...).
However, in order to add different storage service support, we have to implement service support one by one, making the complexity into
O(m*n)
. If we can build a lib that provides a general, orthogonal API, we can reduce the global complexity toO(m+n)
.For Databend
The current design on DAL is highly affected by
parquet
. We have to supportAsyncSeek
on storage services that don't have nativeseek
support, which makes the API hard to both use and implements. Current DAL hides too many details that the caller doesn't know whether this call will send a new HTTP request.Proposal
So I propose to build a new DAL lib that:
In the future, the DAL lib will be extracted from Databend and adopted by different projects.
In this lib, we will provide the following features:
read
,write
,stat
,delete
,list
.Next Steps
The first step of my vision is to refactor the DAL design in Databend codebase. I will validate my ideas in a real project.
Once I think my idea is mature enough, I will create a tracking issue to let the community catch up on the progress. Stay tuned, and thanks for reading!
Beta Was this translation helpful? Give feedback.
All reactions