[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder #14257

dongjoon-hyun · 2016-07-19T07:38:11Z

What changes were proposed in this pull request?

Currently, the generated SQLs have not-stable IDs for generated attributes.
The stable generated SQL will give more benefit for understanding or testing the queries.
This PR provides stable SQL generation by the followings.

Provide unique ids for generated subqueries, gen_subquery_xxx.
Provide unique and stable ids for generated attributes, gen_attr_xxx.

Before

scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res0: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res1: String = SELECT `gen_attr_4` AS `1` FROM (SELECT 1 AS `gen_attr_4`) AS gen_subquery_0

After

scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res1: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0
scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
res2: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0

How was this patch tested?

Pass the existing Jenkins tests.

SparkQA · 2016-07-19T09:10:10Z

Test build #62517 has finished for PR 14257 at commit 2699f9e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-24T05:15:01Z

Test build #62761 has finished for PR 14257 at commit 12180db.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-24T17:05:48Z

Test build #62772 has finished for PR 14257 at commit a1776ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-25T07:26:55Z

Test build #62796 has finished for PR 14257 at commit 06fb99c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLBuilder(

dongjoon-hyun · 2016-07-25T07:29:34Z

Hi, @rxin .
Could you review this stable SQL generation PR?

rxin · 2016-07-25T16:43:53Z

I'm less sure about the new format -- it leads to tons of whitespaces for very long SQL queries.

dongjoon-hyun · 2016-07-25T16:49:05Z

No problem. We can simply remove object SQLBuilder and modify two lines.
May I remove those?

dongjoon-hyun · 2016-07-25T16:50:10Z

It's just for helping review processes.

dongjoon-hyun · 2016-07-25T17:31:20Z

I removed the new format stuff. I'll update this PR, soon.

dongjoon-hyun · 2016-07-25T17:39:10Z

Now, the PR is updated with new golden SQL files.

rxin · 2016-07-25T19:14:01Z

cc @liancheng

SparkQA · 2016-07-25T19:36:40Z

Test build #62829 has finished for PR 14257 at commit 6b3adbf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLBuilder(

…n SQLBuilder

dongjoon-hyun · 2016-07-26T03:19:13Z

sql/hive/src/test/resources/sqlgen/predicate_subquery.sql

 select * from t1 b where exists (select * from t1 a)
 --------------------------------------------------------------------------------
-SELECT `gen_attr` AS `a` FROM (SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr` AS `a` FROM ((SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0) AS gen_subquery_1) AS gen_subquery_1)) AS b
+SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS `gen_attr_0` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr_1` AS `a` FROM ((SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` FROM `default`.`t1`) AS gen_subquery_2) AS gen_subquery_1) AS gen_subquery_1)) AS b


For this query, I reformatted like the following. Here, gen_subquery_xxxs are generated uniquely. But, gen_subquery_1 is repeated due to the added nested subquery alias. Please note that it's not a repetition by duplicated ID and happens as a direct double nesting. I think it's okay.

SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS `gen_attr_0` FROM `default`.`t1` ) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr_1` AS `a` FROM ( (SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` FROM `default`.`t1` ) AS gen_subquery_2 ) AS gen_subquery_1 ) AS gen_subquery_1 ) ) AS b

SparkQA · 2016-07-26T05:00:06Z

Test build #62862 has finished for PR 14257 at commit 16b0f49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-07-26T07:20:49Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala

 * supported by this builder (yet).
 */
-class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
+class SQLBuilder(


Let's mark this constructor as private and add a new public constructor without nextSubqueryId, nextGenAttrId, or exprIdMap since these arguments are only used within SQLBuilder.

Sure, thank you, @liancheng !

liancheng · 2016-07-26T07:25:41Z

LGTM except for one minor comment. Thanks!

SparkQA · 2016-07-26T16:31:27Z

Test build #62893 has finished for PR 14257 at commit 29e953a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-07-26T17:05:49Z

It was Java internal error.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (sharedRuntime.cpp:834), pid=49858, tid=140477527770880

dongjoon-hyun · 2016-07-26T17:05:56Z

Retest this please.

SparkQA · 2016-07-26T18:55:22Z

Test build #62896 has finished for PR 14257 at commit 29e953a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-07-26T19:38:39Z

Hi, @liancheng and @rxin .
It's not ready for review again.
Thank you always!

liancheng · 2016-07-27T05:12:56Z

LGTM, merging to master. Thanks!

rxin · 2016-07-27T06:51:24Z

I'm also going to merge this in branch-2.0 to avoid conflicts in fixing bugs in this area.

Currently, the generated SQLs have not-stable IDs for generated attributes. The stable generated SQL will give more benefit for understanding or testing the queries. This PR provides stable SQL generation by the followings. - Provide unique ids for generated subqueries, `gen_subquery_xxx`. - Provide unique and stable ids for generated attributes, `gen_attr_xxx`. **Before** ```scala scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL res0: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0 scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL res1: String = SELECT `gen_attr_4` AS `1` FROM (SELECT 1 AS `gen_attr_4`) AS gen_subquery_0 ``` **After** ```scala scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL res1: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0 scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL res2: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) AS gen_subquery_0 ``` Pass the existing Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14257 from dongjoon-hyun/SPARK-16621. (cherry picked from commit 5b8e848) Signed-off-by: Reynold Xin <rxin@databricks.com>

dongjoon-hyun · 2016-07-27T11:56:30Z

Thank you, @liancheng and @rxin !

dongjoon-hyun mentioned this pull request Jul 20, 2016

[SPARK-16475][SQL] Broadcast Hint for SQL Queries #14132

Closed

dongjoon-hyun changed the title ~~[SPARK-16621][SQL][WIP] Use a stable ID generation method for attributes in SQLBuilder~~ [SPARK-16621][SQL] Use a stable ID generation method for attributes in SQLBuilder Jul 25, 2016

dongjoon-hyun changed the title ~~[SPARK-16621][SQL] Use a stable ID generation method for attributes in SQLBuilder~~ [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder Jul 25, 2016

dongjoon-hyun mentioned this pull request Jul 25, 2016

[SPARK-16703][SQL] Remove extra whitespace in SQL generation for window functions #14334

Closed

dongjoon-hyun mentioned this pull request Jul 26, 2016

[SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS queries #14307

Closed

dongjoon-hyun added 2 commits July 25, 2016 19:55

[SPARK-16621][SQL] Use a stable ID generation method for attributes i…

a199fe1

…n SQLBuilder

Rebase and convert predicate_subquery.sql.

16b0f49

dongjoon-hyun reviewed Jul 26, 2016
View reviewed changes

liancheng reviewed Jul 26, 2016
View reviewed changes

Make new constructor private.

29e953a

asfgit closed this in 5b8e848 Jul 27, 2016

dongjoon-hyun deleted the SPARK-16621 branch August 14, 2016 09:45

[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder #14257

[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder #14257

Uh oh!

Conversation

dongjoon-hyun commented Jul 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 19, 2016

Uh oh!

SparkQA commented Jul 24, 2016

Uh oh!

SparkQA commented Jul 24, 2016

Uh oh!

SparkQA commented Jul 25, 2016

Uh oh!

dongjoon-hyun commented Jul 25, 2016

Uh oh!

rxin commented Jul 25, 2016

Uh oh!

dongjoon-hyun commented Jul 25, 2016

Uh oh!

dongjoon-hyun commented Jul 25, 2016

Uh oh!

dongjoon-hyun commented Jul 25, 2016

Uh oh!

dongjoon-hyun commented Jul 25, 2016

Uh oh!

rxin commented Jul 25, 2016

Uh oh!

SparkQA commented Jul 25, 2016

Uh oh!

dongjoon-hyun Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 26, 2016

Uh oh!

liancheng Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jul 26, 2016

Uh oh!

SparkQA commented Jul 26, 2016

Uh oh!

dongjoon-hyun commented Jul 26, 2016

Uh oh!

dongjoon-hyun commented Jul 26, 2016

Uh oh!

SparkQA commented Jul 26, 2016

Uh oh!

dongjoon-hyun commented Jul 26, 2016

Uh oh!

liancheng commented Jul 27, 2016

Uh oh!

rxin commented Jul 27, 2016

Uh oh!

dongjoon-hyun commented Jul 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun commented Jul 19, 2016 •

edited

Loading