-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Introduce convert Expr to SQL string API and basic feature #9517
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is entirely new functionality, I think it is a good idea to merge this as a start so we can move further development to the DataFusion repo, where it will be easiest to collaborate with more interested parties.
@@ -0,0 +1,52 @@ | |||
pub trait Dialect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is too bad we can't simply use the Dialect trait already in sqlparser... but it does appear that trait can't tell you the quote style directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make a change upstream to sqlparser? I would be happy to review such a PR (or maybe just make it?) We could leave a comment in the PR to remove this trait when the new sqlparser cate was released
let r = self.expr_to_sql(right.as_ref())?; | ||
let op = self.op_to_sql(op)?; | ||
|
||
Ok(self.binary_op_to_sql(l, r, op)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found that we lose order of operations information here, but that wrapping the output in SQLExpr::Nested fixes it.
E.g.
x+1 / 2
is not the same as
(x+1)/2
I can open a follow up PR with some tests to show this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does DF represent this nesting explicitly as well? Or does it use the Expr hierarchy to capture the order?
If it's the latter, do we want to inject brackets for every operations or have a heuristic to only add them when mathematically needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on playing around with round trips (AST->Logical Plan->AST) I believe that Logical Plans encode operation ordering in the tree itself while AST represents the literal ordering in the SQL string. The easiest way to guarantee mathematical equivalence would be to always add SQLExpr::Nested. A heuristic could work but would be quite a bit more complex I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DF represents the nesting via Expr
children
In the SQL parser, there are precident rules to resolve how a + b + c
into a tree and then DataFusion simply gets the same tree structure
So the difference between (a + b) + c
ad a + (b + c)
is a function of what the left
and right
of the BinaryExpr
are
I agree with @devinjdangelo that we should file a ticket to track this issue and work on in a subsequent PR
} | ||
} | ||
|
||
fn scalar_to_sql(&self, v: &ScalarValue) -> Result<ast::Value> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that some DataFusion Scalars will require mapping to a SQLExpr. For example for date scalars, I think the only way is to convert to something like CAST(quoted_string_val to DATE)
.
This is a private function though, so does not need to be resolved prior to merging imo.
#[test] | ||
fn expr_to_sql_ok() -> Result<()> { | ||
let tests: Vec<(Expr, &str)> = vec![ | ||
(col("a").gt(lit(4)), r#"a > 4"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests look good to start. It might also be a good idea to have some tests that do a full round trip using SqltoRel, e.g. String -> AST -> LogicalPlan -> AST -> String and making sure the Strings and AST match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the round trip test would be good (and will make it quite mechanical to add coverage for the missing expressions)
Something like
round_trip("a"); // parse "a" to Expr and then turn back to string
round_trip("a + b");
...
I think we could do it as a follow on PR as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thank you @backkem and @devinjdangelo -- this looks great.
Open question: Do we need this to be feature complete before landing?
No
Since this is entirely new functionality, I think it is a good idea to merge this as a start so we can move further development to the DataFusion repo, where it will be easiest to collaborate with more interested parties.
I agree 100%
Typically what we do in DataFusion is incrementally develop features on main. The requirements are typically:
- The new code has test coverage
- It doesn't regress any existing functionality (aka make it impossible to do something that used to be possible)
- Any unimplemented features return
DataFusionError::NotYetImplemented
rather than, for example, panics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @backkem and @devinjdangelo (again)!
I started the CI -- once that is clean I think this PR would be good to merge
Here are some follow on items that occur to me. How would you like to track them (perhaps I can file a ticket or we can put a list on the existing ticket)?
- Upstream changes in
datafusion/sql/src/unparser/dialect.rs
to sqlparser-rs - Additional Expr variant support
- Sorting out binary expr nesting
- Add an example of how to use this API in
expr_api.rs
example - Round trip tests
@@ -0,0 +1,52 @@ | |||
pub trait Dialect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make a change upstream to sqlparser? I would be happy to review such a PR (or maybe just make it?) We could leave a comment in the PR to remove this trait when the new sqlparser cate was released
let r = self.expr_to_sql(right.as_ref())?; | ||
let op = self.op_to_sql(op)?; | ||
|
||
Ok(self.binary_op_to_sql(l, r, op)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DF represents the nesting via Expr
children
In the SQL parser, there are precident rules to resolve how a + b + c
into a tree and then DataFusion simply gets the same tree structure
So the difference between (a + b) + c
ad a + (b + c)
is a function of what the left
and right
of the BinaryExpr
are
I agree with @devinjdangelo that we should file a ticket to track this issue and work on in a subsequent PR
#[test] | ||
fn expr_to_sql_ok() -> Result<()> { | ||
let tests: Vec<(Expr, &str)> = vec![ | ||
(col("a").gt(lit(4)), r#"a > 4"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the round trip test would be good (and will make it quite mechanical to add coverage for the missing expressions)
Something like
round_trip("a"); // parse "a" to Expr and then turn back to string
round_trip("a + b");
...
I think we could do it as a follow on PR as well
I believe I addressed the points that we didn't decide to ticket out. The CI indicates that some workflow still requires approval from a maintainer |
I see the CI fails on some dead code. I'll look into it. There is another CI failure but it seems related to a network timeout. |
Onwards and upwards (and thank you for your first contribution @backkem ) |
Relates to #9495.
This is an initial attempt at porting over the SQL Unparser as described in the tracking ticket.
Open question: Do we need this to be feature complete before landing?