-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for invalid option #15640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #67560 has finished for PR 15640 at commit
|
srinathshankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick PR
| if (ctx.partitionSpec == null && | ||
| ctx.identifier != null && | ||
| ctx.identifier.getText.toLowerCase == "noscan") { | ||
| if (ctx.partitionSpec == null && ctx.identifier != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if partition spec is not null ?
What happens with something like
ANALYZE TABLE mytable PARTITION (a) garbage
(Could you add a test for that ?)
Maybe
if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") {
throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", ctx)
}
could be moved to the top ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review and I'll handle that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case is added.
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also parse tests for AnalyzeTable in sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Let's have these in the same place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, @srinathshankar .
I understand the reason why you put this there, so I looked at the StatisticsSuite.scala in both hive module and sql module.
But, we will not compare the value for this test case. If it's a parsing only grammar testcase, I prefer to put this in core.
How do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, I'll remove the normal cases which raises no exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. But, the parse test should be rewritten. Is it okay? Those testcase uses assertAnalyzeCommand using SparkSession and looks like the following.
def assertAnalyzeCommand(analyzeCommand: String, c: Class[_]) {
val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand)
val operators = parsed.collect {
case a: AnalyzeTableCommand => a
case o => o
}
assert(operators.size === 1)
if (operators(0).getClass() != c) {
fail(
s"""$analyzeCommand expected command: $c, but got ${operators(0)}
|parsed command:
|$parsed
""".stripMargin)
}
}
assertAnalyzeCommand(
"ANALYZE TABLE Table1 COMPUTE STATISTICS",
classOf[AnalyzeTableCommand])There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it's fine to rewrite and simplify. If you could do that, that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I have no objection for that. Actually, I love to do that.
But, I'd like to wait for some directional advice from committer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the guide!
|
Hi, @srinathshankar . |
|
Test build #67587 has finished for PR 15640 at commit
|
|
Test build #67588 has finished for PR 15640 at commit
|
|
Hi, @hvanhovell . |
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.
| assertEqual("analyze table t compute statistics noscan", | ||
| AnalyzeTableCommand(TableIdentifier("t"), noscan = true)) | ||
| intercept("analyze table t compute statistics xxxx", "Expected `NOSCAN` instead of `xxxx`") | ||
| intercept("analyze table t partition (a) compute statistics xxxx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: , "ExpectedNOSCANinstead ofxxxx" here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, It's done.
|
Test build #67598 has finished for PR 15640 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok. I left a few comments.
| if (ctx.partitionSpec == null && | ||
| ctx.identifier != null && | ||
| ctx.identifier.getText.toLowerCase == "noscan") { | ||
| if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this check into the first if statement. There is no need to check this twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that the treatment of the partionSpec is quite funky. We scan the table as soon as a user defines a spec. Could you remove the null check; maybe it is better to just log a warning message and do what the user specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @hvanhovell . I'll update the PR.
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.
|
Test build #67792 has finished for PR 15640 at commit
|
|
The only test failure seems to be irrelevant. Let's see the final test result which is still running. |
|
Test build #67793 has finished for PR 15640 at commit
|
|
LGTM - merging to master. Thanks! |
|
Thank you, @hvanhovell ! Also, Thank you, @srinathshankar . |
…valid option
## What changes were proposed in this pull request?
Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.
**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```
**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```
## How was this patch tested?
Pass the Jenkins test with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#15640 from dongjoon-hyun/SPARK-18106.
…valid option
## What changes were proposed in this pull request?
Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.
**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```
**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```
## How was this patch tested?
Pass the Jenkins test with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#15640 from dongjoon-hyun/SPARK-18106.
What changes were proposed in this pull request?
Currently,
ANALYZE TABLEcommand acceptsidentifierfor optionNOSCAN. This PR raises a ParseException for unknown option.Before
After
How was this patch tested?
Pass the Jenkins test with a new test case.