Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Support LogicalPlan --> SQL String #8661

Closed
5 tasks done
tempbottle opened this issue Dec 27, 2023 · 14 comments
Closed
5 tasks done

[EPIC] Support LogicalPlan --> SQL String #8661

tempbottle opened this issue Dec 27, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@tempbottle
Copy link

tempbottle commented Dec 27, 2023

Part of #9494

Is your feature request related to a problem or challenge?

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Sub tasks

@tempbottle tempbottle added the enhancement New feature or request label Dec 27, 2023
@alamb
Copy link
Contributor

alamb commented Dec 28, 2023

Hi @tempbottle -- no sadly there is no way that I know if to go from logical plan --> SQL

I think it would be possible in theory, though probably a non trivial amount of code

A required step would be to go from Expr --> SQL expression, which might be a good place to start

@devinjdangelo
Copy link
Contributor

I looked into this myself recently and also concluded that DataFusion has no way currently to recover the SQL strings that generated a LogicalPlan or parts of it (Exprs). Likewise, I don't think sqlparser-rs has any mechanism to go from AST back to raw SQL. Either case would be non trivial to implement, and raises the question of what SQL dialect you want to support.

I'm not sure what your use case is @tempbottle, but in my case I am looking at implementing TableProviders over top of other engines which accept SQL. For this use case, it may be more fruitful to look at https://substrait.io/. DataFusion does have the capability to convert a LogicalPlan to a Substrait plan. If the engine of interest does not consume Substrait, you are probably stuck writing custom glue code to map a LogicalPlan or Exprs into whatever SQL dialect or DSL that the target engine supports.

I'll also point out that GlareDb has some examples of mapping Exprs to SQL strings for the purposes of implementing a TableProvider. See e.g. https://github.com/GlareDB/glaredb/blob/ee4b440d9a594150926ce0a5fd96db400185c881/crates/datasources/src/postgres/mod.rs#L1242

@alamb
Copy link
Contributor

alamb commented Dec 30, 2023

Likewise, I don't think sqlparser-rs has any mechanism to go from AST back to raw SQL.

I am pretty sure the Display impl in sqlparser-rs recovers SQL from the parsed AST (as we have had many PRs related to keeping it working)

I agree substrait.io is a good idea for connecting other databases.

@matthewmturner
Copy link
Contributor

@tempbottle @devinjdangelo FYI Ibis has a to_sql method that can be used to generate SQL and DataFusion is a backend for Ibis however I'm not sure if it has all the features required for to_sql to work (the latest i saw was that DataFusion still had a lot of gaps to fill with the Ibis wrapper). It also may not be the API you were looking for, but thought it could help as you could see how they implement it.

https://ibis-project.org/tutorials/ibis-for-sql-users

@devinjdangelo
Copy link
Contributor

I am pretty sure the Display impl in sqlparser-rs recovers SQL from the parsed AST (as we have had many PRs related to keeping it working)

That's very interesting i'll have to look more closely at that. Thanks @alamb !

@alamb alamb changed the title Is there any method to generate sql from a LogicalPlan? Support LogicalPlan --> SQL String Mar 7, 2024
@alamb
Copy link
Contributor

alamb commented Mar 7, 2024

FYI I filed #9495 for the Expr --> String feature. That would likely be a necesary first step for a full LogicalPlan --> SQL conversion

@backkem
Copy link
Contributor

backkem commented Mar 8, 2024

I split out datafusion-federation's SQL Writer code into its own package and added an example for expressions and plans. I intend to mature it over time. It's open to contributions and/or moving to a more canonical place.

@alamb
Copy link
Contributor

alamb commented Mar 8, 2024

For anyone else following along, there is substantial discussion on #9495 (comment) worth reading

@backkem
Copy link
Contributor

backkem commented Mar 12, 2024

Does it make sense for me to lay the same 'groundwork' for the LogicalPlan serialization as I did for the Expr in #9517?
The suggestion would be similar: Upstream the datafusion-federation sql-write code as follows:

  • Upstream the AST builders to sqlparser-rs.
  • Upstream the plan unparser to datafusion/sql/src/unparser.
    • Expose a plan_to_sql method similar to expr_to_sql.

Any early thoughts or feedback on the datafusion-federation code sql-write code is appreciated, so I can address it in the process.

@devinjdangelo
Copy link
Contributor

Does it make sense for me to lay the same 'groundwork' for the LogicalPlan serialization as I did for the Expr in #9517?

I am in support of this. We should be cautious about what we expose via public interfaces at first, since I think it is likely we will require many breaking changes to support all possible logical plans. The top level plan_to_sql function should be stable, but we will need freedom to rework methods below that.

Upstream the AST builders to sqlparser-rs.

I am a bit surprised there isn't an ergonomic way to build an AST already in sqlparser. I can see that the DFParser constructs Statements directly without a builder struct.

If the AST builder functionality is not particularly useful in sqlparser, it is likely best to include that in datafusion only rather than upstream to sqlparser. Do you think the AST builders would be useful upstream @alamb?

@alamb
Copy link
Contributor

alamb commented Mar 13, 2024

Does it make sense for me to lay the same 'groundwork' for the LogicalPlan serialization as I did for the Expr in #9517?

I agree with @devinjdangelo that I think it does.

If the AST builder functionality is not particularly useful in sqlparser, it is likely best to include that in datafusion only rather than upstream to sqlparser. Do you think the AST builders would be useful upstream @alamb?

I am not sure, to be honest. I think if we upstreamed a AST builder style thing in sqlparser, we would also want to migrate the existing parser to use it (and therefore take advantage of the existing tests). I think as long as it were backwards compatible it would make a nice contribution.

However, I would also say there is no compelling need to upstream it either. Having it start life in DataFusion would also be fine and then maybe upstream it in sqlparser

backkem added a commit to backkem/arrow-datafusion that referenced this issue Mar 13, 2024
backkem added a commit to backkem/arrow-datafusion that referenced this issue Mar 13, 2024
alamb pushed a commit that referenced this issue Mar 13, 2024
@backkem
Copy link
Contributor

backkem commented Mar 13, 2024

The initial PR (#9596) has landed. For anyone interested in helping to flesh out the LogicalPlan serialization to SQL, a good place to start are the remaining not_impl_err! statements in plan.rs. Add an integration test in roundtrip_statement in sql::tests and try to make it pass!

@seddonm1
Copy link
Contributor

Independently I have spent a fair bit of time on this problem. I think this current design of going to the sqlparser AST is a better approach than trying to go to SQL directly.

One thing that really helped was that I went and copied ~150ish queries from https://www.w3resource.com/sql-exercises/adventureworks/adventureworks-exercises.php to do a roundtrip test against. These are licensed https://creativecommons.org/licenses/by-nc-sa/3.0/deed.en so could be added to the test suite.

As you start to add more queries you start to see a lot of edge cases (particularly relating to how aggregations work) that need to be dealt with.

@alamb
Copy link
Contributor

alamb commented Nov 22, 2024

I think we can claim this is complete with all the great work from @devinjdangelo @backkem @goldmedal @phillipleblanc and others. Thank you

@alamb alamb closed this as completed Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants