-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore: refactor datasources #2744
Labels
chore
DX, infra etc that's not build or CI related
Comments
I like this!
Hope these help. I'm pretty excited for this to get more simple! |
I love this idea - it opens up the possibility for guides to make it easy for contributors to add new data sources, too. |
unsolicited opinion but generally agree with the basic premise, subscribed and excited to see where this goes! |
This was referenced Mar 6, 2024
universalmind303
added a commit
that referenced
this issue
Mar 6, 2024
some more prereq work for making the tableoptions more flexible as part of #2744. The `Arbitrary` trait has a cascading effect, so it's kind of all or nothing, and since we can't have `trait TableOptions: Arbitrary` in an object-safe way, it's gotta go.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
proposal to do some internal refactoring on table functions and datasources to make adding new ones easier, while also making better usage of incremental compilation.
Some background context..
It is a pretty big pain point that table syntax is completely separate from table function syntax. We usually don't implement these at the same time due to the added complexity of the table syntax.
The code also lives in very different parts of the application and is quite difficult to follow.
Ideally we should be able to create a new datasource in isolation and register it all in one shot. Currently we have to
TableProvider
impl ex:crates/datasources/src/excel/mod.rs
crates/sqlbuiltins/src/functions/table/excel.rs:21
TableFunc
crates/sqlbuiltins/src/functions/table/excel.rs:21
TableOptions
protocrates/protogen/proto/metastore/options.proto:208
TableOptions
crates/protogen/src/metastore/types/options.rs:1053`options.rs
)crates/sqlexec/src/planner/session_planner.rs:903
That's a lot of different parts of the code that need to be changed. Datasources should instead just program to a trait or traits, we register them in the registry, and then all of the above should just work. Reducing the amount of code to just two. (datasource implementation, and registry).
Proposal
The
sqlbuiltins::functions
module currently providesTableFunc
(used to convert a table function into a table provider)BuiltinFunction
this should be renamed toFunctionCatalogEntry
or similar (used to register a function within the catalog)and datafusion provides us with
TableProvider
trait that is used for creating record batches.Currently we have no traits for the table options syntax so it is manually implemented each time, but we probably need one.
DynTableOptions
(bikeshedding)I suggest that we create a new crate that provides all of this functionality wrapped up in a single trait
Datasource: TableFunc + FunctionCatalogEntry + TableProvider + TryFrom<DynTableOptions>
(bikeshedding). then that can be used to populate the catalog, function registry, and planner all at once.So when adding a new datasource, you just implement this set of traits, and plug it in as
Arc<dyn Datasource>
somewhere and we're good to go.The text was updated successfully, but these errors were encountered: