Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 31, 2016

What changes were proposed in this pull request?

This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

Spark 1.6

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]

Spark 2.0

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)

After this PR, it's supported.

How was this patch tested?

Pass the Jenkins test with a newly added testcase.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67843 has finished for PR 15704 at commit 84f2315.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67857 has finished for PR 15704 at commit a3061e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already have listPartitionsByFilter, do we need another dropPartitionsByFilter? Looks like there are many duplicate codes between them.

We can combine listPartitionsByFilter and dropPartitions do the same thing, instead of adding new API like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @viirya and @hvanhovell .
Sure, no problem. I just thought we need to have this in ExternalCatalog before Catalog Federation ( SPARK-15777 ).
I will remove those stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listPartitionsByFilter is not supported in InMemoryCatalog. So, we should use this only when it is needed.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67919 has finished for PR 15704 at commit 05c83fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67921 has finished for PR 15704 at commit 72084a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed to match SqlBaseParser.NSEQ might cause runtime error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Validation is added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name looks confusing. Actually they are not more complex operators, are they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, hasNonEqualToComparison is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to do splitConjunctivePredicates. Just iterates each attribute in every spec expression's references and do the following resolving check, should be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That's much better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be not clear enough. Add a short error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for dropping multiple partition specs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 251 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like PARTITON (quarter <= '2'), PARTITION (quarter >= '4').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that case. Sure! Thank you again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test case like PARTITON (quarter <= '4'), PARTITION (quarter <= '2') to see what will happen since after the first partition spec is removed the second one may be failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add that, we should make another testcases because the remaining partitions are not enough to test that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the case by updating the existing testcases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase this error message? Looks a bit confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised it.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @viirya . I'll fix them tomorrow~

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68015 has finished for PR 15704 at commit d3c3ca5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 3, 2016

LGTM except one comment for the test.

cc @hvanhovell for second look.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @viirya . I added the testcase and fixed related bug again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding it inside the following match block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Moved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is wrong.

sql("ALTER TABLE sales DROP PARTITION (country < 'KR', quarter)")

The above statement also matches this case, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the above test case into the negative test cases. Thanks!

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Right. The above code is wrong. I'll add the test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I missed this. Move the check of "<=>" to match should avoid this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more negative case? How about unknown <=> upper('KR')?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current master behavior looks the following. I added new case with the same behavior.

scala> sql("ALTER TABLE sales DROP PARTITION (unknown = upper('KR'))").show
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '(' expecting STRING(line 1, pos 49)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ${spec} -> $spec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@dongjoon-hyun
Copy link
Member Author

Thank you, @gatorsmile ! I'll fix soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BinaryComparison}
import org.apache.spark.sql.catalyst.expressions.{EqualTo, Expression, PredicateHelper}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I deleted my previous comment. I just realized it.

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68048 has finished for PR 15704 at commit 19faa2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

I addressed all comments. For the upcoming comments, I'll handle them tomorrow. Thank you so much always, @gatorsmile .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asInstanceOf looks risky. It depends on the generation of specs

@gatorsmile
Copy link
Member

gatorsmile commented Nov 3, 2016

Tonight, I did not finish the review, but I have a general question about this PR:

ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]

It sounds like this PR is not handling IF EXISTS when users use non equalTo operators. Is this intentional? What is the behavior of Hive?

Updated: IF EXISTS is handled, but please add a test case for this scenario. BTW, could you check what is the behavior of Hive when the partition does not exist? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Do you mean this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, we are handling ifExists here. It sounds like no test case to cover this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the Hive's error message and maybe we can improve this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya
Copy link
Member

viirya commented Nov 3, 2016

Described in hive manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions

In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

@SparkQA
Copy link

SparkQA commented Nov 12, 2016

Test build #68559 has finished for PR 15704 at commit ae1d7df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 12, 2016

LGTM and let's see if @hvanhovell has more comments.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Looks good. I left a few minor comment. We are almost there.

val expressions = deletedPartitions.map { specs =>
specs.map { case (key, value) =>
EqualTo(AttributeReference(key, StringType)(), Literal.create(value, StringType))
}.reduceLeft(org.apache.spark.sql.catalyst.expressions.And)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just And?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

table.identifier, normalizedSpecs, ignoreIfNotExists = ifExists, purge = purge)
if (specs.exists(isRangeComparison)) {
val partitionSet = scala.collection.mutable.Set.empty[CatalogTablePartition]
specs.foreach { spec =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use flatMap and distinct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done!

} else {
val normalizedSpecs = specs.map { expr =>
val spec = splitConjunctivePredicates(expr).map {
case BinaryComparison(left, right) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pattern match on left?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


partitionVal
: identifier (EQ constant)?
: expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also remove the partitionVal rule

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed now.

Row("country=KR/quarter=3") :: Nil)

// According to the declarative partition spec definitions, this drops the union of target
// partitions without exceptions. Hive raises exceptions because it handle them sequentially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: handle -> handles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 14, 2016

Thank you for review again, @hvanhovell . I'll fix them soon.

@hvanhovell
Copy link
Contributor

LGTM - pending jenkins

@dongjoon-hyun
Copy link
Member Author

Thank you, @hvanhovell !

@SparkQA
Copy link

SparkQA commented Nov 15, 2016

Test build #68640 has finished for PR 15704 at commit fab5682.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell or @gatorsmile .
Could you merge this PR?

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@asfgit asfgit closed this in 3ce057d Nov 16, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you so much, @hvanhovell , @gatorsmile, @viirya !

@dongjoon-hyun
Copy link
Member Author

Oh, @hvanhovell.
Can make a backport for branch-2.1 which will release this month?

asfgit pushed a commit that referenced this pull request Nov 16, 2016
## What changes were proposed in this pull request?

This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

**Spark 1.6**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```

**Spark 2.0**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```

After this PR, it's supported.

## How was this patch tested?

Pass the Jenkins test with a newly added testcase.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #15704 from dongjoon-hyun/SPARK-17732-2.
@hvanhovell
Copy link
Contributor

hvanhovell commented Nov 16, 2016

@dongjoon-hyun I have cherry picked it into branch 2.1. Please note that I will revert this as soon as it causes any problems, and then we will push this to 2.2.

@dongjoon-hyun
Copy link
Member Author

I see. I agree. Thank you so much for cherry-picking.

@dongjoon-hyun
Copy link
Member Author

I feel that we are really so close to 2.1. :)

@dongjoon-hyun dongjoon-hyun deleted the SPARK-17732-2 branch November 19, 2016 12:37
@hvanhovell
Copy link
Contributor

I am reverting this from 2.1. See https://issues.apache.org/jira/browse/SPARK-18515 for more information.

ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 28, 2016
…omparators

## What changes were proposed in this pull request?

apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1.

This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16036 from cloud-fan/revert.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…omparators

## What changes were proposed in this pull request?

apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1.

This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16036 from cloud-fan/revert.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

**Spark 1.6**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```

**Spark 2.0**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```

After this PR, it's supported.

## How was this patch tested?

Pass the Jenkins test with a newly added testcase.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#15704 from dongjoon-hyun/SPARK-17732-2.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…omparators

## What changes were proposed in this pull request?

apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1.

This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16036 from cloud-fan/revert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants