-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #67843 has finished for PR 15704 at commit
|
|
Test build #67857 has finished for PR 15704 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we already have listPartitionsByFilter, do we need another dropPartitionsByFilter? Looks like there are many duplicate codes between them.
We can combine listPartitionsByFilter and dropPartitions do the same thing, instead of adding new API like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @viirya and @hvanhovell .
Sure, no problem. I just thought we need to have this in ExternalCatalog before Catalog Federation ( SPARK-15777 ).
I will remove those stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listPartitionsByFilter is not supported in InMemoryCatalog. So, we should use this only when it is needed.
|
Test build #67919 has finished for PR 15704 at commit
|
|
Test build #67921 has finished for PR 15704 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed to match SqlBaseParser.NSEQ might cause runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Validation is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name looks confusing. Actually they are not more complex operators, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, hasNonEqualToComparison is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to do splitConjunctivePredicates. Just iterates each attribute in every spec expression's references and do the following resolving check, should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. That's much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be not clear enough. Add a short error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test for dropping multiple partition specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 251 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something like PARTITON (quarter <= '2'), PARTITION (quarter >= '4').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that case. Sure! Thank you again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a test case like PARTITON (quarter <= '4'), PARTITION (quarter <= '2') to see what will happen since after the first partition spec is removed the second one may be failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add that, we should make another testcases because the remaining partitions are not enough to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the case by updating the existing testcases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase this error message? Looks a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised it.
|
Thank you for review, @viirya . I'll fix them tomorrow~ |
|
Test build #68015 has finished for PR 15704 at commit
|
|
LGTM except one comment for the test. cc @hvanhovell for second look. |
|
Thank you so much, @viirya . I added the testcase and fixed related bug again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not adding it inside the following match block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is wrong.
sql("ALTER TABLE sales DROP PARTITION (country < 'KR', quarter)")The above statement also matches this case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the above test case into the negative test cases. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Right. The above code is wrong. I'll add the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I missed this. Move the check of "<=>" to match should avoid this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more negative case? How about unknown <=> upper('KR')?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current master behavior looks the following. I added new case with the same behavior.
scala> sql("ALTER TABLE sales DROP PARTITION (unknown = upper('KR'))").show
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '(' expecting STRING(line 1, pos 49)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ${spec} -> $spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
|
Thank you, @gatorsmile ! I'll fix soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BinaryComparison}
import org.apache.spark.sql.catalyst.expressions.{EqualTo, Expression, PredicateHelper}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I deleted my previous comment. I just realized it.
|
Test build #68048 has finished for PR 15704 at commit
|
|
I addressed all comments. For the upcoming comments, I'll handle them tomorrow. Thank you so much always, @gatorsmile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asInstanceOf looks risky. It depends on the generation of specs
|
Tonight, I did not finish the review, but I have a general question about this PR:
Updated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Do you mean this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, we are handling ifExists here. It sounds like no test case to cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the Hive's error message and maybe we can improve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be "Partition or table doesn't exist."
|
Described in hive manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true. |
|
Test build #68559 has finished for PR 15704 at commit
|
|
LGTM and let's see if @hvanhovell has more comments. |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Looks good. I left a few minor comment. We are almost there.
| val expressions = deletedPartitions.map { specs => | ||
| specs.map { case (key, value) => | ||
| EqualTo(AttributeReference(key, StringType)(), Literal.create(value, StringType)) | ||
| }.reduceLeft(org.apache.spark.sql.catalyst.expressions.And) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just And?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
| table.identifier, normalizedSpecs, ignoreIfNotExists = ifExists, purge = purge) | ||
| if (specs.exists(isRangeComparison)) { | ||
| val partitionSet = scala.collection.mutable.Set.empty[CatalogTablePartition] | ||
| specs.foreach { spec => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use flatMap and distinct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done!
| } else { | ||
| val normalizedSpecs = specs.map { expr => | ||
| val spec = splitConjunctivePredicates(expr).map { | ||
| case BinaryComparison(left, right) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pattern match on left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
|
||
| partitionVal | ||
| : identifier (EQ constant)? | ||
| : expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also remove the partitionVal rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed now.
| Row("country=KR/quarter=3") :: Nil) | ||
|
|
||
| // According to the declarative partition spec definitions, this drops the union of target | ||
| // partitions without exceptions. Hive raises exceptions because it handle them sequentially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: handle -> handles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
|
Thank you for review again, @hvanhovell . I'll fix them soon. |
|
LGTM - pending jenkins |
|
Thank you, @hvanhovell ! |
|
Test build #68640 has finished for PR 15704 at commit
|
|
Hi, @hvanhovell or @gatorsmile . |
|
Merging to master. Thanks! |
|
Thank you so much, @hvanhovell , @gatorsmile, @viirya ! |
|
Oh, @hvanhovell. |
## What changes were proposed in this pull request?
This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.
**Spark 1.6**
``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```
**Spark 2.0**
``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```
After this PR, it's supported.
## How was this patch tested?
Pass the Jenkins test with a newly added testcase.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #15704 from dongjoon-hyun/SPARK-17732-2.
|
@dongjoon-hyun I have cherry picked it into branch 2.1. Please note that I will revert this as soon as it causes any problems, and then we will push this to 2.2. |
|
I see. I agree. Thank you so much for cherry-picking. |
|
I feel that we are really so close to 2.1. :) |
|
I am reverting this from 2.1. See https://issues.apache.org/jira/browse/SPARK-18515 for more information. |
…omparators ## What changes were proposed in this pull request? apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1. This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16036 from cloud-fan/revert.
…omparators ## What changes were proposed in this pull request? apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1. This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16036 from cloud-fan/revert.
## What changes were proposed in this pull request?
This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.
**Spark 1.6**
``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```
**Spark 2.0**
``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```
After this PR, it's supported.
## How was this patch tested?
Pass the Jenkins test with a newly added testcase.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#15704 from dongjoon-hyun/SPARK-17732-2.
…omparators ## What changes were proposed in this pull request? apache#15704 will fail if we use int literal in `DROP PARTITION`, and we have reverted it in branch-2.1. This PR reverts it in master branch, and add a regression test for it, to make sure the master branch is healthy. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16036 from cloud-fan/revert.
What changes were proposed in this pull request?
This PR aims to support
comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.Spark 1.6
Spark 2.0
After this PR, it's supported.
How was this patch tested?
Pass the Jenkins test with a newly added testcase.