You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I think it is a bit confusing now how to use DataFusion with a custom catalog.
Background
DataFusion is primarily a query engine, rather than a complete database system that also must handle persistence, catalog management, ingest, data lifecycle management, and other things.
Systems like Ballista or GreptimeDB are examples of complete systems that use DataFusion for query but have their own catalog implementations.
However, in order to function the query engine needs to read information catalog, and DataFusion provides a rich set of APIs such as the following
The interface and use between the built in catalog support and how to plug in an external catalog are not super clear. For example this PR #5277
Also, as projects like #5130 get under way it becomes even more important to distinguish between catalog manipulations and simply catalog read-only access
Another example is the fact that SessionContext::sql by default modifies the in memory catalog:
I would like a clearer interface (or maybe just documentation) that makes it clear what manipulations are allowed and which are not, as well as an example that other people could follow to implement an external catalog. This interface should make it clear what the catalog supports and what it does not (aka does it allow creating new tables or views?)
I'm actually currently working on figuring out the catalog api and implementing a catalog for my own project. Would be happy to adapt some of my code into an example.
* catalog example
* add license and example description at top of file
* ddl example
* comment
* cleanup extra code
* clippy
* remove clippy ignore stmt
* better comment
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I think it is a bit confusing now how to use DataFusion with a custom catalog.
Background
DataFusion is primarily a query engine, rather than a complete database system that also must handle persistence, catalog management, ingest, data lifecycle management, and other things.
Systems like Ballista or GreptimeDB are examples of complete systems that use DataFusion for query but have their own catalog implementations.
However, in order to function the query engine needs to read information catalog, and DataFusion provides a rich set of APIs such as the following
The query engine also knows how to plan for Catalog manipulations which often need planner support (e.g. to do type checking or coercion, etc)
Making things even more confusing is that DataFusion does have a basic ephemeral in-memory based catalog implementation, https://docs.rs/datafusion/18.0.0/datafusion/catalog/catalog/struct.MemoryCatalogList.html and the methods on SessionContext know how to modify that memory catalog.
Challenges
The interface and use between the built in catalog support and how to plug in an external catalog are not super clear. For example this PR #5277
Also, as projects like #5130 get under way it becomes even more important to distinguish between catalog manipulations and simply catalog read-only access
Another example is the fact that
SessionContext::sql
by default modifies the in memory catalog:https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.sql
Describe the solution you'd like
I would like a clearer interface (or maybe just documentation) that makes it clear what manipulations are allowed and which are not, as well as an example that other people could follow to implement an external catalog. This interface should make it clear what the catalog supports and what it does not (aka does it allow creating new tables or views?)
To do this, I suggest:
This project might also help
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
N/A
The text was updated successfully, but these errors were encountered: