Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SQL strings generated from Exprs "prettier" #10557

Closed
Tracked by #9494
alamb opened this issue May 17, 2024 · 11 comments · Fixed by #10573
Closed
Tracked by #9494

Make SQL strings generated from Exprs "prettier" #10557

alamb opened this issue May 17, 2024 · 11 comments · Fixed by #10573
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 17, 2024

Is your feature request related to a problem or challenge?

Part of #9494

As @backkem says #10528 (comment) on #10528

Currently, expressions from the DataFusion SQL unparser (aka expr --> String) are somewhat ugly

For example the expression col("a").eq(lit(5)) would be rendered as a = 5 by most poeple if they did it manaully, but DataFusion's unparser currently renders it like "a" = 5 (with extra quotes).

DataFusion also puts in quotes to make the order of operations explicit -- so instead of a < 5 AND b < 10 it would render ("a" < 5) AND ("b" < 10)

The current unparser is conservative and likely works well for when generating SQL for consumptions by other database systems. However, the SQL is not as nice for human consumption

Here is another instance from the example

let expr = col("a").lt(lit(5)).or(col("a").eq(lit(8)));
let ast = expr_to_sql(&expr)?;
let sql = format!("{}", ast);
assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#);

Describe the solution you'd like

If we want to make the generated SQL easier to read by humans / more succint, these steps will have to be made "smarter".

Describe alternatives you've considered

Potential ideas:

  1. We'll have to add in the math rules to avoid unneeded parentheses
  2. (likely dialect specific) rules for determining of quoting is needed.

Note that the latter likely involves listing out the reserved keywords for each dialect.

Additional context

No response

@goldmedal
Copy link
Contributor

Provide something I surveyed.

I think we can follow how Calcite handles the quoted issue. The SqlDialect of Calcite has a check rule identifierNeedsQuote.

https://github.com/apache/calcite/blob/aba64f0b217093b500629fe07a0befdc68293fbc/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L413-L415

It can be overridden according to the specific data source, such as BigQuery:

https://github.com/apache/calcite/blob/50a20824c4536450dcae963b5e757cf4bbc7e406/core/src/main/java/org/apache/calcite/sql/dialect/BigQuerySqlDialect.java#L106-L109

They implement some rules, such as regex patterns or reserved word lists. I think a dialect-specific rule is a nice choice.

@backkem
Copy link
Contributor

backkem commented May 18, 2024

Indeed, there is already a function on the sqlparser::dialect trait that takes this into account:

https://github.com/sqlparser-rs/sqlparser-rs/blob/54184460b5d873a67c2801e8b7c6e4f145bc65df/src/dialect/mod.rs#L113-L116

The dialect specific implementations just need to be expanded on. For now they just always return a conservative quote character.

@goldmedal
Copy link
Contributor

https://github.com/sqlparser-rs/sqlparser-rs/blob/54184460b5d873a67c2801e8b7c6e4f145bc65df/src/dialect/mod.rs#L113-L116

The dialect specific implementations just need to be expanded on. For now they just always return a conservative quote character.

I think it's same as the dialect in datafusion::unparser::dialect

fn identifier_quote_style(&self) -> Option<char> {

However, what we need is a checker to check if the identifier needs to be quoted.
I think I can make a PR for DefaultDialect first.

@goldmedal
Copy link
Contributor

As the mentioned in dialect.rs

/// Note: this trait will eventually be replaced by the Dialect in the SQLparser package

I think we need to use the Dialect in sqlparser-rs instead and extract identifier_needs_quote in #10573 to sqlparser-rs. Just like apache/datafusion-sqlparser-rs#1170

@backkem
Copy link
Contributor

backkem commented May 19, 2024

Yes, these are basically the same object. The one in DataFusion was put there temporarily until the trait extension in the sqlparser repo is landed and pushed to crates.io. This may have happened in the meantime.

@alamb
Copy link
Contributor Author

alamb commented May 20, 2024

#10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade.

@tisonkun
Copy link
Member

tisonkun commented May 20, 2024

#10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade.

We may need a 0.46.1 for resolving the regressions:

I've locally confirmed that array.slt is last failure for cargo test --test sqllogictests.

@goldmedal
Copy link
Contributor

I'm not sure but I think we can merge #10573 first because it also fix many unpasring tests. Then, I'll create PR for sqlparser to add the check rule in dialect.

@alamb
Copy link
Contributor Author

alamb commented May 22, 2024

FWI #10573 is merged!

@backkem
Copy link
Contributor

backkem commented May 23, 2024

Do we split off a ticket reduce the nr of brackets emitted?

@alamb
Copy link
Contributor Author

alamb commented May 23, 2024

Do we split off a ticket reduce the nr of brackets emitted?

Excellent call -- I filed #10633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants