-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst #10583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Factor out Hive dependencies - 2. Factor out hard coded UDFT's; let the Hive function registry deal resolve generators. Split Ql into Catalyst/Spark/Hive part; move parser to catalyst Style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this in catalyst? Just since we tend to hide non-public APIs there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah (reynold just suggested the same thing), I'll add it to catalyst.parser.
|
Test build #48709 has finished for PR 10583 at commit
|
|
retest this please |
|
Test build #2322 has finished for PR 10583 at commit
|
|
Test build #48747 has finished for PR 10583 at commit
|
|
Test build #48766 has finished for PR 10583 at commit
|
|
Test build #48771 has finished for PR 10583 at commit
|
|
Test build #48780 has finished for PR 10583 at commit
|
|
retest this please |
|
Test build #48789 has finished for PR 10583 at commit
|
|
Can you update the pull request description? It still says WIP. |
|
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the indentation is off here?
|
cc @cloud-fan can you take a look at this? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a boolean conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported this directly form Hive. It has only two options:
none: no quoting. I'll map this to false.column: quoting. I'll map this to true.
|
BTW given the size of the pull request, I think we can also merge it provided that it has no structural problems, and then review feedback in follow-up prs. |
# Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/ErrorPositionSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of our out-dated Hive Compatibility Tests are using SQL11 reserved keywords as identifiers. We should first fix those before we can set this flag to true.
|
Test build #48853 has finished for PR 10583 at commit
|
|
Regarding the questions about |
|
Due to the size of the patch, I'm going to merge this in now. @hvanhovell can address more comments as follow-up prs. |
This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:
The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the
org.apache.spark.sql.catalyst.parserpackage. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up theASTNodeclass, and to improve the error handling.The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
CatalystQl: This implements Query and Expression parsing functionality.SparkQl: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.HiveQl: This is a subclass ofSparkQland this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.cc @rxin