-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder #14257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #62517 has finished for PR 14257 at commit
|
|
Test build #62761 has finished for PR 14257 at commit
|
|
Test build #62772 has finished for PR 14257 at commit
|
|
Test build #62796 has finished for PR 14257 at commit
|
|
Hi, @rxin . |
|
I'm less sure about the new format -- it leads to tons of whitespaces for very long SQL queries. |
|
No problem. We can simply remove |
|
It's just for helping review processes. |
|
I removed the new format stuff. I'll update this PR, soon. |
|
Now, the PR is updated with new golden SQL files. |
|
cc @liancheng |
|
Test build #62829 has finished for PR 14257 at commit
|
| select * from t1 b where exists (select * from t1 a) | ||
| -------------------------------------------------------------------------------- | ||
| SELECT `gen_attr` AS `a` FROM (SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr` AS `a` FROM ((SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0) AS gen_subquery_1) AS gen_subquery_1)) AS b | ||
| SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS `gen_attr_0` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr_1` AS `a` FROM ((SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` FROM `default`.`t1`) AS gen_subquery_2) AS gen_subquery_1) AS gen_subquery_1)) AS b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this query, I reformatted like the following. Here, gen_subquery_xxxs are generated uniquely. But, gen_subquery_1 is repeated due to the added nested subquery alias. Please note that it's not a repetition by duplicated ID and happens as a direct double nesting. I think it's okay.
SELECT `gen_attr_0` AS `a`
FROM (SELECT `gen_attr_0`
FROM (SELECT `a` AS `gen_attr_0`
FROM `default`.`t1`
) AS gen_subquery_0
WHERE EXISTS(SELECT `gen_attr_1` AS `a`
FROM ( (SELECT `gen_attr_1`
FROM (SELECT `a` AS `gen_attr_1`
FROM `default`.`t1`
) AS gen_subquery_2
) AS gen_subquery_1
) AS gen_subquery_1
)
) AS b|
Test build #62862 has finished for PR 14257 at commit
|
| * supported by this builder (yet). | ||
| */ | ||
| class SQLBuilder(logicalPlan: LogicalPlan) extends Logging { | ||
| class SQLBuilder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mark this constructor as private and add a new public constructor without nextSubqueryId, nextGenAttrId, or exprIdMap since these arguments are only used within SQLBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thank you, @liancheng !
|
LGTM except for one minor comment. Thanks! |
|
Test build #62893 has finished for PR 14257 at commit
|
|
It was Java internal error. |
|
Retest this please. |
|
Test build #62896 has finished for PR 14257 at commit
|
|
Hi, @liancheng and @rxin . |
|
LGTM, merging to master. Thanks! |
|
I'm also going to merge this in branch-2.0 to avoid conflicts in fixing bugs in this area. |
Currently, the generated SQLs have not-stable IDs for generated attributes.
The stable generated SQL will give more benefit for understanding or testing the queries.
This PR provides stable SQL generation by the followings.
- Provide unique ids for generated subqueries, `gen_subquery_xxx`.
- Provide unique and stable ids for generated attributes, `gen_attr_xxx`.
**Before**
```scala
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res0: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res1: String = SELECT `gen_attr_4` AS `1` FROM (SELECT 1 AS `gen_attr_4`) AS gen_subquery_0
```
**After**
```scala
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res1: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res2: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0
```
Pass the existing Jenkins tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #14257 from dongjoon-hyun/SPARK-16621.
(cherry picked from commit 5b8e848)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
Thank you, @liancheng and @rxin ! |
What changes were proposed in this pull request?
Currently, the generated SQLs have not-stable IDs for generated attributes.
The stable generated SQL will give more benefit for understanding or testing the queries.
This PR provides stable SQL generation by the followings.
gen_subquery_xxx.gen_attr_xxx.Before
After
How was this patch tested?
Pass the existing Jenkins tests.