[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15704

dongjoon-hyun · 2016-10-31T23:00:16Z

What changes were proposed in this pull request?

This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

Spark 1.6

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]

Spark 2.0

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)

After this PR, it's supported.

How was this patch tested?

Pass the Jenkins test with a newly added testcase.

SparkQA · 2016-11-01T00:35:12Z

Test build #67843 has finished for PR 15704 at commit 84f2315.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-01T03:49:24Z

Test build #67857 has finished for PR 15704 at commit a3061e2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-11-01T04:20:05Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

As we already have listPartitionsByFilter, do we need another dropPartitionsByFilter? Looks like there are many duplicate codes between them.

We can combine listPartitionsByFilter and dropPartitions do the same thing, instead of adding new API like this.

Thank you for review, @viirya and @hvanhovell .
Sure, no problem. I just thought we need to have this in ExternalCatalog before Catalog Federation ( SPARK-15777 ).
I will remove those stuff.

dongjoon-hyun · 2016-11-01T20:19:28Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

listPartitionsByFilter is not supported in InMemoryCatalog. So, we should use this only when it is needed.

SparkQA · 2016-11-01T22:46:37Z

Test build #67919 has finished for PR 15704 at commit 05c83fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-01T22:58:34Z

Test build #67921 has finished for PR 15704 at commit 72084a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-11-02T05:16:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Failed to match SqlBaseParser.NSEQ might cause runtime error.

Yep. Validation is added.

viirya · 2016-11-02T05:24:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

The function name looks confusing. Actually they are not more complex operators, are they?

Maybe, hasNonEqualToComparison is better?

viirya · 2016-11-02T05:34:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

I think we don't need to do splitConjunctivePredicates. Just iterates each attribute in every spec expression's references and do the following resolving check, should be enough.

Yep. That's much better.

viirya · 2016-11-02T05:35:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

This might be not clear enough. Add a short error message?

viirya · 2016-11-02T05:41:47Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

Let's add a test for dropping multiple partition specs?

I mean something like PARTITON (quarter <= '2'), PARTITION (quarter >= '4').

Oh, I missed that case. Sure! Thank you again.

We should add a test case like PARTITON (quarter <= '4'), PARTITION (quarter <= '2') to see what will happen since after the first partition spec is removed the second one may be failed.

To add that, we should make another testcases because the remaining partitions are not enough to test that.

I added the case by updating the existing testcases.

viirya · 2016-11-02T05:43:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Rephrase this error message? Looks a bit confusing.

I revised it.

dongjoon-hyun · 2016-11-02T07:55:09Z

Thank you for review, @viirya . I'll fix them tomorrow~

SparkQA · 2016-11-03T00:52:34Z

Test build #68015 has finished for PR 15704 at commit d3c3ca5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-11-03T03:31:12Z

LGTM except one comment for the test.

cc @hvanhovell for second look.

dongjoon-hyun · 2016-11-03T04:40:37Z

Thank you so much, @viirya . I added the testcase and fixed related bug again.

gatorsmile · 2016-11-03T05:50:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Why not adding it inside the following match block?

Yep. Moved.

Actually, this is wrong.

sql("ALTER TABLE sales DROP PARTITION (country < 'KR', quarter)")

The above statement also matches this case, right?

Please add the above test case into the negative test cases. Thanks!

Yep. Right. The above code is wrong. I'll add the test case.

Ah. I missed this. Move the check of "<=>" to match should avoid this...

gatorsmile · 2016-11-03T05:54:06Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

One more negative case? How about unknown <=> upper('KR')?

The current master behavior looks the following. I added new case with the same behavior.

scala> sql("ALTER TABLE sales DROP PARTITION (unknown = upper('KR'))").show org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '(' expecting STRING(line 1, pos 49)

gatorsmile · 2016-11-03T05:57:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

Nit: ${spec} -> $spec

dongjoon-hyun · 2016-11-03T06:19:33Z

Thank you, @gatorsmile ! I'll fix soon.

gatorsmile · 2016-11-03T06:26:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BinaryComparison} import org.apache.spark.sql.catalyst.expressions.{EqualTo, Expression, PredicateHelper}

gatorsmile · 2016-11-03T06:35:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

Sorry, I deleted my previous comment. I just realized it.

SparkQA · 2016-11-03T06:58:24Z

Test build #68048 has finished for PR 15704 at commit 19faa2a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-11-03T07:02:21Z

I addressed all comments. For the upcoming comments, I'll handle them tomorrow. Thank you so much always, @gatorsmile .

gatorsmile · 2016-11-03T07:24:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

asInstanceOf looks risky. It depends on the generation of specs

gatorsmile · 2016-11-03T07:29:58Z

Tonight, I did not finish the review, but I have a general question about this PR:

ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]

~~It sounds like this PR is not handling IF EXISTS when users use non equalTo operators. Is this intentional?~~ What is the behavior of Hive?

Updated: IF EXISTS is handled, but please add a test case for this scenario. BTW, could you check what is the behavior of Hive when the partition does not exist? Thanks!

viirya · 2016-11-03T07:33:20Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

@gatorsmile Do you mean this?

uh, we are handling ifExists here. It sounds like no test case to cover this case.

gatorsmile · 2016-11-03T07:45:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

Please check the Hive's error message and maybe we can improve this

https://github.com/apache/hive/blob/345353c0ea5d3ddda9f6d89cbf8cd0e92726fcb6/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2288

I think it should be "Partition or table doesn't exist."

viirya · 2016-11-03T07:47:06Z

Described in hive manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions

In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

SparkQA · 2016-11-12T11:04:05Z

Test build #68559 has finished for PR 15704 at commit ae1d7df.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-11-12T12:04:29Z

LGTM and let's see if @hvanhovell has more comments.

hvanhovell

@dongjoon-hyun Looks good. I left a few minor comment. We are almost there.

hvanhovell · 2016-11-14T21:04:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

+              val expressions = deletedPartitions.map { specs =>
+                specs.map { case (key, value) =>
+                  EqualTo(AttributeReference(key, StringType)(), Literal.create(value, StringType))
+                }.reduceLeft(org.apache.spark.sql.catalyst.expressions.And)


just And?

hvanhovell · 2016-11-14T21:24:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

-      table.identifier, normalizedSpecs, ignoreIfNotExists = ifExists, purge = purge)
+    if (specs.exists(isRangeComparison)) {
+      val partitionSet = scala.collection.mutable.Set.empty[CatalogTablePartition]
+      specs.foreach { spec =>


use flatMap and distinct?

Sure. Done!

hvanhovell · 2016-11-14T21:25:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+    } else {
+      val normalizedSpecs = specs.map { expr =>
+        val spec = splitConjunctivePredicates(expr).map {
+          case BinaryComparison(left, right) =>


Use pattern match on left?

hvanhovell · 2016-11-14T21:27:20Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4


 partitionVal
-    : identifier (EQ constant)?
+    : expression


You could also remove the partitionVal rule

It's removed now.

hvanhovell · 2016-11-14T21:29:11Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+        Row("country=KR/quarter=3") :: Nil)
+
+      // According to the declarative partition spec definitions, this drops the union of target
+      // partitions without exceptions. Hive raises exceptions because it handle them sequentially.


NIT: handle -> handles

dongjoon-hyun · 2016-11-14T22:42:46Z

Thank you for review again, @hvanhovell . I'll fix them soon.

hvanhovell · 2016-11-14T23:24:55Z

LGTM - pending jenkins

dongjoon-hyun · 2016-11-14T23:26:24Z

Thank you, @hvanhovell !

SparkQA · 2016-11-15T01:49:31Z

Test build #68640 has finished for PR 15704 at commit fab5682.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-11-15T18:04:32Z

Hi, @hvanhovell or @gatorsmile .
Could you merge this PR?

hvanhovell · 2016-11-15T23:58:41Z

Merging to master. Thanks!

dongjoon-hyun · 2016-11-16T00:04:41Z

Thank you so much, @hvanhovell , @gatorsmile, @viirya !

dongjoon-hyun · 2016-11-16T00:09:07Z

Oh, @hvanhovell.
Can make a backport for branch-2.1 which will release this month?

## What changes were proposed in this pull request? This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility. **Spark 1.6** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string] ``` **Spark 2.0** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {')', ','}(line 1, pos 42) ``` After this PR, it's supported. ## How was this patch tested? Pass the Jenkins test with a newly added testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15704 from dongjoon-hyun/SPARK-17732-2.

hvanhovell · 2016-11-16T00:14:33Z

@dongjoon-hyun I have cherry picked it into branch 2.1. Please note that I will revert this as soon as it causes any problems, and then we will push this to 2.2.

dongjoon-hyun · 2016-11-16T00:17:41Z

I see. I agree. Thank you so much for cherry-picking.

dongjoon-hyun · 2016-11-16T00:18:11Z

I feel that we are really so close to 2.1. :)

hvanhovell · 2016-11-20T16:44:38Z

I am reverting this from 2.1. See https://issues.apache.org/jira/browse/SPARK-18515 for more information.

…omparators ## What changes were proposed in this pull request? apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1. This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16036 from cloud-fan/revert.

## What changes were proposed in this pull request? This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility. **Spark 1.6** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string] ``` **Spark 2.0** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {')', ','}(line 1, pos 42) ``` After this PR, it's supported. ## How was this patch tested? Pass the Jenkins test with a newly added testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#15704 from dongjoon-hyun/SPARK-17732-2.

…omparators ## What changes were proposed in this pull request? apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1. This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16036 from cloud-fan/revert.

dongjoon-hyun mentioned this pull request Oct 31, 2016

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15302

Closed

viirya reviewed Nov 1, 2016

View reviewed changes

dongjoon-hyun commented Nov 1, 2016

View reviewed changes

viirya reviewed Nov 2, 2016

View reviewed changes

gatorsmile reviewed Nov 3, 2016

View reviewed changes

viirya reviewed Nov 3, 2016

View reviewed changes

gatorsmile reviewed Nov 3, 2016

View reviewed changes

Minimize the underlying calls.

ae1d7df

hvanhovell requested changes Nov 14, 2016

View reviewed changes

Remove partitionVal, use flatMap/distinct, and fixes other things.

fab5682

asfgit closed this in 3ce057d Nov 16, 2016

dongjoon-hyun deleted the SPARK-17732-2 branch November 19, 2016 12:37

cloud-fan mentioned this pull request Nov 28, 2016

[SPARK-17732][SQL] Revert ALTER TABLE DROP PARTITION should support comparators #16036

Closed

dongjoon-hyun mentioned this pull request Nov 30, 2016

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15987

Closed

DazhuangSu mentioned this pull request Nov 8, 2017

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators #19691

Closed

weixiuli mentioned this pull request Oct 28, 2019

[SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support partition filter in ALTER TABLE DROP PARTITION and batch dropping PARTITIONS #26280

Closed

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15704

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15704

Uh oh!

Conversation

dongjoon-hyun commented Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 2, 2016

Uh oh!

SparkQA commented Nov 3, 2016

Uh oh!

viirya commented Nov 3, 2016

Uh oh!

dongjoon-hyun commented Nov 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 31, 2016 •

edited

Loading

dongjoon-hyun Nov 3, 2016 •

edited

Loading

gatorsmile commented Nov 3, 2016 •

edited

Loading