-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13321][SQL] Add nested union test cases #11361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Conflicts: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala
|
cc @rxin I am not sure if HiveCompatibilitySuite.union16 was hanging from this. Because I copy the same queries from union16 to HiveQuerySuite and they are working locally. I will see how jenkins outputs from this change as it now should run HiveCompatibilitySuite. |
|
Test build #51928 has finished for PR 11361 at commit
|
|
@rxin Looks like HiveCompatibilitySuite.union16 doesn't hang from this. But it actually takes long time to finish that test ( |
|
Can you take a look why would 2 queries take 13 mins? When I was running this, this was running in parser forever. |
|
@rxin ok. I got why it takes so long to finish the test. The original query: will result an analyzed plan like: In Because PR #11195 adds a Basically the parser processes nested union query with a recursive approach, to parse such deeply nested query cost much time. That is why union16 takes so long to finish. If we remove the Then the union16 can normally finish under this patch. |
|
@rxin I don't think we should convert union plan back to nested sql query. I would like to remove the |
|
Any other thing we can do for this perf problem? It's only 25 levels of nesting. It seems strange to me that the parser would take mins to parse this ... It's hard for me to believe it's just because of some recursion. Is there some exponential complexity here? cc @hvanhovell |
|
ok. I will continue to see if we can improve the performance of parsing nested union. |
|
@viirya I am currently working ANTLR4 based version of the parsers (see my repo for a few initial commits). It is basically a port of the presto parser. I need another week or so to get most of the HQL functionality working. Perhaps we should wait with this until the new parser is ready. (edited) |
|
@hvanhovell Great to see your initial work. It looks promising. I think this can wait until the new parser. Besides, are we going to retire ANTLR3 used now? |
|
@viirya I really don't see any reason to keep ANTLR3 around after we migrate the parser. |
Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala
|
cc @hvanhovell @rxin Because new ANTLR4 parser seems can support this syntax. I updated this to add test cases only. Please take a look. Thanks! |
| |(SELECT `t0`.`id` FROM `default`.`t0`)) AS u_1 | ||
| """.stripMargin) | ||
|
|
||
| val expected = Project( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: @viirya could you update this use the DSL and assertEqual equals? It makes this a bit easier to read.
BTW this test is very similar to the following test case: https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L384-L392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. indeed. If so, I think I can close this pr now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell Is new ANTLR4 parser natively to solve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya the new parser handles nested queries a lot better. This is mainly due to ANTLR4's better parsing algorithms.
|
Test build #54689 has finished for PR 11361 at commit
|
JIRA: https://issues.apache.org/jira/browse/SPARK-13321
What changes were proposed in this pull request?
Looks like the following SQL can be parsed now with new ANTLR4 parser:
We just need to add test cases.
How was this patch tested?
New tests are added to
PlanParserSuitandHiveQuerySuite.