-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs #12220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #55158 has finished for PR 12220 at commit
|
It seems that Hive supports dropping partitions based on partial specs, where not all partitioned columns are accompanied with specified values in the spec. Ironically there's no API to achieve this using the Hive client, so we need to implement it ourselves in Spark (see HiveClientImpl.scala). Additionally two tests in HiveCompatibilitySuite use features that we explicitly do not allow, so those tests are now added to the blacklist.
|
Test build #55245 has finished for PR 12220 at commit
|
|
Test build #55274 has finished for PR 12220 at commit
|
and also push ignoreIfNotExists into HiveClient.
|
Test build #55286 has finished for PR 12220 at commit
|
| * The syntax of this command is: | ||
| * {{{ | ||
| * ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...] [PURGE]; | ||
| * ALTER VIEW view DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not support ALTER VIEW view DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]; because the semantic of having partitions defined with a view is not defined?
A previous patch already throws an exception for us when this happens. I changed the exception to be AnalysisException so we can write the test easier. Eventually all of these exceptions will be made consistent. This is mainly just a documentation + test change.
|
Looks like #12169 already added the check in the parser so we don't have to add it ourselves. The last commit just removes the bad comment and adds two tests for the views. |
|
Actually, I have another question. If the table is an external table, we should not delete the data when we drop the partition. What do we do for this case? |
|
OK. I think metastore will not delete the data for external tables even if deleteData is set to true (https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2239-L2251). Do we have tests for external tables? If not, can we add tests? Thanks! |
|
OK, let's do that in a separate patch. |
|
Test build #55390 has finished for PR 12220 at commit
|
|
test this please |
| val dropOptions = new PartitionDropOptions | ||
| dropOptions.ifExists = ignoreIfNotExists | ||
| dropOptions.purgeData = purge | ||
| client.dropPartition(db, table, hivePartition.getValues, dropOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just realized that PURGE was added in Hive 0.14. I think we need to either not support it or add tests to VersionsSuite to make sure it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
Test build #55391 has finished for PR 12220 at commit
|
|
Test build #55396 has finished for PR 12220 at commit
|
|
@andrewor14 I know you are busy. Feel free to let me know if you need me to submit test cases to verify if the data is not deleted when we drop the partition of an external table. |
|
@gatorsmile that would be great, thanks! |
|
LGTM! |
|
Sure, will do it after this is merged. Thanks! |
|
Test build #55565 has finished for PR 12220 at commit
|
|
Merging to master! |
| // The provided spec here can be a partial spec, i.e. it will match all partitions | ||
| // whose specs are supersets of this partial spec. E.g. If a table has partitions | ||
| // (b='1', c='1') and (b='1', c='2'), a partial spec of (b='1') will match both. | ||
| val matchingParts = client.getPartitions(hiveTable, s.asJava).asScala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid there is a bug in getPartitions. This function could return a wrong result when the partitioning column names are wrong. For example, if we pass a='0', it will return all the partitions. We drop all the partitions in this case. The expected return should be an empty set, right?
Two possible solutions:
- get the whole list, and filter it out by ourselves.
- check if the specs contains any column that is not part of table partitioning at the beginning.
Will try to submit a PR today and fix the issue based on the solution 2? Please let me know if you want to do it. Thanks! @andrewor14 @yhuai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive does not have such an issue. It detects the error and reports
hive> ALTER TABLE extTable_with_partitions DROP PARTITION (a='0');
FAILED: SemanticException Column a not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SessionCatalog should take care the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to avoid of fetching all partition specs to the client side. If there is a large table, fetching all partition spec will be very slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meant that a was not a partitioning column, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it soon. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/SPARK-14603 is the JIRA for the work of using SessionCatalog to check if a metadata operation is valid or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, will use this JIRA to handle all the cases. Thanks!
What changes were proposed in this pull request?
This implements a few alter table partition commands using the
SessionCatalog. In particular:The following operations are not supported, and an
AnalysisExceptionwith a helpful error message will be thrown if the user tries to use them:How was this patch tested?
DDLSuite,DDLCommandSuiteandHiveDDLCommandSuite