Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Currently, ANALYZE TABLE command accepts identifier for option NOSCAN. This PR raises a ParseException for unknown option.

Before

scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []

After

scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)

How was this patch tested?

Pass the Jenkins test with a new test case.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67560 has finished for PR 15640 at commit 4819dd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@srinathshankar srinathshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick PR

if (ctx.partitionSpec == null &&
ctx.identifier != null &&
ctx.identifier.getText.toLowerCase == "noscan") {
if (ctx.partitionSpec == null && ctx.identifier != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if partition spec is not null ?
What happens with something like

ANALYZE TABLE mytable PARTITION (a) garbage

(Could you add a test for that ?)
Maybe

if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") {
  throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", ctx)
}

could be moved to the top ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review and I'll handle that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case is added.

intercept("explain describe tables x", "Unsupported SQL statement")
}

test("SPARK-18106 analyze table") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also parse tests for AnalyzeTable in sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Let's have these in the same place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, @srinathshankar .

I understand the reason why you put this there, so I looked at the StatisticsSuite.scala in both hive module and sql module.
But, we will not compare the value for this test case. If it's a parsing only grammar testcase, I prefer to put this in core.

How do you think about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I'll remove the normal cases which raises no exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. But, the parse test should be rewritten. Is it okay? Those testcase uses assertAnalyzeCommand using SparkSession and looks like the following.

    def assertAnalyzeCommand(analyzeCommand: String, c: Class[_]) {
      val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand)
      val operators = parsed.collect {
        case a: AnalyzeTableCommand => a
        case o => o
      }

      assert(operators.size === 1)
      if (operators(0).getClass() != c) {
        fail(
          s"""$analyzeCommand expected command: $c, but got ${operators(0)}
             |parsed command:
             |$parsed
           """.stripMargin)
      }
    }

    assertAnalyzeCommand(
      "ANALYZE TABLE Table1 COMPUTE STATISTICS",
      classOf[AnalyzeTableCommand])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it's fine to rewrite and simplify. If you could do that, that would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I have no objection for that. Actually, I love to do that.
But, I'd like to wait for some directional advice from committer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the guide!

@dongjoon-hyun
Copy link
Member Author

Hi, @srinathshankar .
For the test suite, I'll update again if you give more opinion. So far, I didn't update that because I'm not sure.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67587 has finished for PR 15640 at commit c92b6f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67588 has finished for PR 15640 at commit 2a7707e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
Could you review this PR when you have some time?

intercept("explain describe tables x", "Unsupported SQL statement")
}

test("SPARK-18106 analyze table") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.

assertEqual("analyze table t compute statistics noscan",
AnalyzeTableCommand(TableIdentifier("t"), noscan = true))
intercept("analyze table t compute statistics xxxx", "Expected `NOSCAN` instead of `xxxx`")
intercept("analyze table t partition (a) compute statistics xxxx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: , "ExpectedNOSCANinstead ofxxxx" here as well.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, It's done.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67598 has finished for PR 15640 at commit ec0516b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok. I left a few comments.

if (ctx.partitionSpec == null &&
ctx.identifier != null &&
ctx.identifier.getText.toLowerCase == "noscan") {
if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check into the first if statement. There is no need to check this twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that the treatment of the partionSpec is quite funky. We scan the table as soon as a user defines a spec. Could you remove the null check; maybe it is better to just log a warning message and do what the user specified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @hvanhovell . I'll update the PR.

intercept("explain describe tables x", "Unsupported SQL statement")
}

test("SPARK-18106 analyze table") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.

@SparkQA
Copy link

SparkQA commented Oct 30, 2016

Test build #67792 has finished for PR 15640 at commit 00b4f54.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The only test failure seems to be irrelevant. Let's see the final test result which is still running.

[info] - SPARK-10562: partition by column with mixed case name *** FAILED *** (687 milliseconds)

@SparkQA
Copy link

SparkQA commented Oct 30, 2016

Test build #67793 has finished for PR 15640 at commit 465f646.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 8ae2da0 Oct 30, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you, @hvanhovell ! Also, Thank you, @srinathshankar .

robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…valid option

## What changes were proposed in this pull request?

Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.

**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```

**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```

## How was this patch tested?

Pass the Jenkins test with a new test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#15640 from dongjoon-hyun/SPARK-18106.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-18106 branch November 7, 2016 00:49
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…valid option

## What changes were proposed in this pull request?

Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.

**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```

**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```

## How was this patch tested?

Pass the Jenkins test with a new test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#15640 from dongjoon-hyun/SPARK-18106.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants