Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add insert/update/delete/(ctas?) to DataFusion planner #4901

Closed
avantgardnerio opened this issue Jan 13, 2023 · 1 comment · Fixed by #4902
Closed

Add insert/update/delete/(ctas?) to DataFusion planner #4901

avantgardnerio opened this issue Jan 13, 2023 · 1 comment · Fixed by #4902
Labels
enhancement New feature or request

Comments

@avantgardnerio
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As a library, parts of DataFusion can be very useful, even if they aren't plumbed entirely through to something like the CLI. At my company, we are attempting to build an HTAP database on top of DataFusion, and running into roadblocks because at present DataFusion cannot convert insert/update/delete/ctas queries into LogicalPlans, even though the sqlparser crate can parse them.

Describe the solution you'd like

Parse, plan, but don't execute insert/update/delete queries. (It might be a fun follow on PR to support creating tables with CTAS from the CLI?)

Describe alternatives you've considered

  • Intercept these at the AST level in our app, and plumb a different path for write plans, but lose functionality like named parameter replacement and have to re-implement.
@alamb
Copy link
Contributor

alamb commented Jan 15, 2023

I think this proposal (to have DDL / DML support in the engine) is very much in the spirit of DataFusion as a library to build other databases on.

Specifically, the semantic analysis / basic plan support for these nodes is non trivial (aka resolving references to columns, etc) and is not database specific. The specific implementations of how to actually implement those commands I think are almost certainly going to be system specific.

Ideally, I hope that DataFusion can have a clear separation between

  1. "built in logical plan nodes that have good implementations" (e.g. the query ones like TableSource, Filter, etc) and
  2. "built in logical plan nodes that have either no or basic implementations" (e.g. like create table, write rel, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants