Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 6, 2022

What changes were proposed in this pull request?

In the PR, I propose to modify Hive catalog implementation of create partition in the case when Hive reports that some of partitions exist already. In that case, Spark will check existence of all input partitions one-by-one and return only existing partitions in PartitionAlreadyExistsException.

Why are the changes needed?

  1. To do not confuse Spark SQL users about existing partitions.
  2. To be consistent with other catalogs V2 and V1 In-Memory.

Does this PR introduce any user-facing change?

Yes, it changes user-facing error message.

Before:

spark-sql> CREATE TABLE t (id bigint, data string) USING HIVE PARTITIONED BY (id);
spark-sql> ALTER TABLE t ADD PARTITION (id=2) LOCATION 'loc1';
spark-sql> ALTER TABLE t ADD PARTITION (id=1) LOCATION 'loc' PARTITION (id=2) LOCATION 'loc1';
The following partitions already exists in table 't' database 'default':
Map(id -> 1)
===
Map(id -> 2)

The error shows an existent and a non-existent partitions.

After:

...
spark-sql> ALTER TABLE t ADD PARTITION (id=1) LOCATION 'loc' PARTITION (id=2) LOCATION 'loc1';
The following partitions already exists in table 't' database 'default':
Map(id -> 2)

The error contains only existent partitions that cause issues.

How was this patch tested?

By running the modified test suites:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"

sql(s"ALTER TABLE $t ADD PARTITION (id=1) LOCATION 'loc'" +
" PARTITION (id=2) LOCATION 'loc1'")
}.getMessage
assert(errMsg === s"The following partitions already exists in table $t:2 -> id")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another mistake. Must be id -> 2, I think. I will fix this separately as soon as I have this test in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the fix: #38152

@MaxGekk MaxGekk changed the title [WIP][SPARK-40521][SQL] Return only exists partitions in the error from ALTER TABLE .. ADD PARTITION [WIP][SPARK-40521][SQL] Return only exists partitions in PartitionsAlreadyExistException from Hive's create partition Oct 6, 2022
@github-actions github-actions bot added the SQL label Oct 7, 2022
@MaxGekk MaxGekk changed the title [WIP][SPARK-40521][SQL] Return only exists partitions in PartitionsAlreadyExistException from Hive's create partition [SPARK-40521][SQL] Return only exists partitions in PartitionsAlreadyExistException from Hive's create partition Oct 7, 2022
@MaxGekk MaxGekk marked this pull request as ready for review October 7, 2022 04:40
@MaxGekk MaxGekk requested a review from cloud-fan October 7, 2022 04:51
@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 7, 2022

@srielau @cloud-fan Could you review this PR, please.

@MaxGekk MaxGekk requested a review from HyukjinKwon October 7, 2022 05:45
@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 7, 2022

Merging to master. Thank you, @cloud-fan for review.

@MaxGekk MaxGekk closed this in 89397b5 Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants