-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Hello everyone, I'm jumping here from [Discussion] Object Store Composition.
Background
Datafusion is using ObjectStore
as it's public storage interface for now. We have public API like register_object_store
:
let object_store_url = ObjectStoreUrl::parse("file://").unwrap();
let object_store = object_store::local::LocalFileSystem::new();
let ctx = SessionContext::new();
// All files with the file:// url prefix will be read from the local file system
ctx.register_object_store(object_store_url.as_ref(), Arc::new(object_store));
With the growth of DF, we have to continuously add more features to object_store
, making it increasingly difficult to compose, as described in [Discussion] Object Store Composition.
The latest example is adding Extensions to object store GetOptions to allow passing tracing spans within the object store, as requested in Improve use of tracing spans in query path.
It's easy to predict that ObjectStore
will move further and further away from its initial position:
Initially the ObjectStore API was relatively simple, consisting of a few methods to interact with object stores. As such many systems took this abstraction and used it as a generic IO abstraction, this is good and what the crate was designed for.
Proposal
So I proposse to build datafusion-storage
primarily focused on DataFusion's own needs while maintaining datafusion-storage-object-store
and datafusion-storage-opendal
separately. The benefit is that users can implement innovative features like datafusion-storage-cudf
or datafusion-storage-io_uring
without being constrained by the current I/O abstraction of object-store or OpenDAL.
If this becomes a reality, DataFusion can design the abstraction based on its own requirements without having to push everything upstream to object_store
. This would allow them to maintain useful features such as context management and add additional requirements to the trait while letting datafusion-storage-object-store
and datafusion-storage-opendal
handle the extra work.
Implematation
We can start by aliasing the ObjectStore
trait within datafusion-storage
first. Given sufficient migration time, we can then fine-tune the trait to better align with DF's specific needs.