Skip to content

Conversation

@hvanhovell
Copy link
Contributor

This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:

The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the org.apache.spark.sql.catalyst.parser package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ASTNode class, and to improve the error handling.

The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:

  • CatalystQl: This implements Query and Expression parsing functionality.
  • SparkQl: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
  • HiveQl: This is a subclass of SparkQl and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.

cc @rxin

Factor out Hive dependencies - 2.

Factor out hard coded UDFT's; let the Hive function registry deal resolve generators.

Split Ql into Catalyst/Spark/Hive part; move parser to catalyst

Style.
@hvanhovell hvanhovell changed the title [SPARK-12573][SPARK-12574][SQL] Move Core Parser from Hive to Catalyst [WIP] [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst [WIP] Jan 5, 2016
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this in catalyst? Just since we tend to hide non-public APIs there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah (reynold just suggested the same thing), I'll add it to catalyst.parser.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48709 has finished for PR 10583 at commit fb3b4a4.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #2322 has finished for PR 10583 at commit fb3b4a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48747 has finished for PR 10583 at commit fb3b4a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48766 has finished for PR 10583 at commit e397370.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48771 has finished for PR 10583 at commit 43c29b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell hvanhovell changed the title [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst [WIP] [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst Jan 5, 2016
@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48780 has finished for PR 10583 at commit 157d178.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48789 has finished for PR 10583 at commit 157d178.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 5, 2016

Can you update the pull request description? It still says WIP.

@hvanhovell
Copy link
Contributor Author

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the indentation is off here?

@rxin
Copy link
Contributor

rxin commented Jan 6, 2016

cc @cloud-fan can you take a look at this? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a boolean conf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported this directly form Hive. It has only two options:

  • none: no quoting. I'll map this to false.
  • column: quoting. I'll map this to true.

@rxin
Copy link
Contributor

rxin commented Jan 6, 2016

BTW given the size of the pull request, I think we can also merge it provided that it has no structural problems, and then review feedback in follow-up prs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of our out-dated Hive Compatibility Tests are using SQL11 reserved keywords as identifiers. We should first fix those before we can set this flag to true.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48853 has finished for PR 10583 at commit 3680d4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Jan 6, 2016

Regarding the questions about LATERAL VIEW. Ideally I think we will support one query language that has a super-set of the features that were previously present in HiveQL and the SparkSQLParser. Where this is not possible I'd probably defer to HiveQL or a SQL Standard.

@rxin
Copy link
Contributor

rxin commented Jan 6, 2016

Due to the size of the patch, I'm going to merge this in now. @hvanhovell can address more comments as follow-up prs.

@asfgit asfgit closed this in ea489f1 Jan 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants