Skip to content

Conversation

@andrewor14
Copy link
Contributor

What changes were proposed in this pull request?

This implements a few alter table partition commands using the SessionCatalog. In particular:

ALTER TABLE ... ADD PARTITION ...
ALTER TABLE ... DROP PARTITION ...
ALTER TABLE ... RENAME PARTITION ... TO ...

The following operations are not supported, and an AnalysisException with a helpful error message will be thrown if the user tries to use them:

ALTER TABLE ... EXCHANGE PARTITION ...
ALTER TABLE ... ARCHIVE PARTITION ...
ALTER TABLE ... UNARCHIVE PARTITION ...
ALTER TABLE ... TOUCH ...
ALTER TABLE ... COMPACT ...
ALTER TABLE ... CONCATENATE
MSCK REPAIR TABLE ...

How was this patch tested?

DDLSuite, DDLCommandSuite and HiveDDLCommandSuite

@andrewor14
Copy link
Contributor Author

@yhuai

@SparkQA
Copy link

SparkQA commented Apr 6, 2016

Test build #55158 has finished for PR 12220 at commit eedb529.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class VectorizedParquetInputFormat extends ParquetInputFormat[InternalRow]

Andrew Or added 2 commits April 7, 2016 13:25
It seems that Hive supports dropping partitions based on partial
specs, where not all partitioned columns are accompanied with
specified values in the spec. Ironically there's no API to
achieve this using the Hive client, so we need to implement it
ourselves in Spark (see HiveClientImpl.scala).

Additionally two tests in HiveCompatibilitySuite use features
that we explicitly do not allow, so those tests are now added
to the blacklist.
@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55245 has finished for PR 12220 at commit 85c8b8f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55274 has finished for PR 12220 at commit 2a37c75.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55286 has finished for PR 12220 at commit 220141d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* The syntax of this command is:
* {{{
* ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...] [PURGE];
* ALTER VIEW view DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not support ALTER VIEW view DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]; because the semantic of having partitions defined with a view is not defined?

A previous patch already throws an exception for us when this
happens. I changed the exception to be AnalysisException so we
can write the test easier. Eventually all of these exceptions
will be made consistent.

This is mainly just a documentation + test change.
@andrewor14
Copy link
Contributor Author

Looks like #12169 already added the check in the parser so we don't have to add it ourselves. The last commit just removes the bad comment and adds two tests for the views.

@yhuai
Copy link
Contributor

yhuai commented Apr 8, 2016

Actually, I have another question. If the table is an external table, we should not delete the data when we drop the partition. What do we do for this case?

@yhuai
Copy link
Contributor

yhuai commented Apr 8, 2016

OK. I think metastore will not delete the data for external tables even if deleteData is set to true (https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2239-L2251). Do we have tests for external tables? If not, can we add tests? Thanks!

@andrewor14
Copy link
Contributor Author

OK, let's do that in a separate patch.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55390 has finished for PR 12220 at commit 3fb9a87.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Apr 8, 2016

test this please

val dropOptions = new PartitionDropOptions
dropOptions.ifExists = ignoreIfNotExists
dropOptions.purgeData = purge
client.dropPartition(db, table, hivePartition.getValues, dropOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just realized that PURGE was added in Hive 0.14. I think we need to either not support it or add tests to VersionsSuite to make sure it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55391 has finished for PR 12220 at commit 12cf973.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RandomForestClassificationModel(TreeEnsembleModels, JavaMLWritable, JavaMLReadable):
    • class HasVarianceCol(Params):
    • class RandomForestRegressionModel(TreeEnsembleModels, JavaMLWritable, JavaMLReadable):
    • case class DeserializeToObject(
    • case class SerializeFromObject(
    • public abstract class ColumnVector implements AutoCloseable
    • case class DeserializeToObject(
    • case class SerializeFromObject(

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55396 has finished for PR 12220 at commit 12cf973.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RandomForestClassificationModel(TreeEnsembleModels, JavaMLWritable, JavaMLReadable):
    • class HasVarianceCol(Params):
    • class RandomForestRegressionModel(TreeEnsembleModels, JavaMLWritable, JavaMLReadable):
    • case class DeserializeToObject(
    • case class SerializeFromObject(
    • public abstract class ColumnVector implements AutoCloseable
    • case class DeserializeToObject(
    • case class SerializeFromObject(

@gatorsmile
Copy link
Member

@andrewor14 I know you are busy. Feel free to let me know if you need me to submit test cases to verify if the data is not deleted when we drop the partition of an external table.

@andrewor14
Copy link
Contributor Author

@gatorsmile that would be great, thanks!

@yhuai
Copy link
Contributor

yhuai commented Apr 12, 2016

LGTM!

@gatorsmile
Copy link
Member

Sure, will do it after this is merged. Thanks!

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55565 has finished for PR 12220 at commit e3f08b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Apr 12, 2016

Merging to master!

@asfgit asfgit closed this in 83fb964 Apr 12, 2016
@andrewor14 andrewor14 deleted the alter-partition-ddl branch April 12, 2016 17:44
// The provided spec here can be a partial spec, i.e. it will match all partitions
// whose specs are supersets of this partial spec. E.g. If a table has partitions
// (b='1', c='1') and (b='1', c='2'), a partial spec of (b='1') will match both.
val matchingParts = client.getPartitions(hiveTable, s.asJava).asScala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid there is a bug in getPartitions. This function could return a wrong result when the partitioning column names are wrong. For example, if we pass a='0', it will return all the partitions. We drop all the partitions in this case. The expected return should be an empty set, right?

Two possible solutions:

  1. get the whole list, and filter it out by ourselves.
  2. check if the specs contains any column that is not part of table partitioning at the beginning.

Will try to submit a PR today and fix the issue based on the solution 2? Please let me know if you want to do it. Thanks! @andrewor14 @yhuai

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive does not have such an issue. It detects the error and reports

hive> ALTER TABLE extTable_with_partitions DROP PARTITION (a='0');
FAILED: SemanticException Column a not found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SessionCatalog should take care the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to avoid of fetching all partition specs to the client side. If there is a large table, fetching all partition spec will be very slow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant that a was not a partitioning column, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it soon. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/SPARK-14603 is the JIRA for the work of using SessionCatalog to check if a metadata operation is valid or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will use this JIRA to handle all the cases. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants