Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented Jun 9, 2020

What changes were proposed in this pull request?

This is a follow up of #25979.
When we inserting overwrite an external hive partitioned table with upper case dynamic partition key, exception thrown.

like:

org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths.

The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in:

// Hive metastore is not case preserving and keeps partition columns with lower cased names,
// and Hive will validate the column names in partition spec to make sure they are partition
// columns. Here we Lowercase the column names before passing the partition spec to Hive
// client, to satisfy Hive.
// scalastyle:off caselocale
orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
// scalastyle:on caselocale

val updatedPartitionSpec = partition.map {
case (key, Some(value)) => key -> value
case (key, None) if dpMap.contains(key) => key -> dpMap(key)
case (key, _) =>
throw new SparkException(s"Dynamic partition key $key is not among " +
"written partition paths.")
}

In this PR, we convert the dynamic partition map to a case insensitive map.

Why are the changes needed?

To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

@turboFei turboFei changed the title [SPARK-29295][FOLLOWUP] Dynamic partition map should be case insensitive. [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive. Jun 9, 2020
@turboFei
Copy link
Member Author

turboFei commented Jun 9, 2020

cc @viirya @cloud-fan

@turboFei turboFei force-pushed the SPARK-29295-follow-up branch from 418038b to e076f6a Compare June 9, 2020 07:54
@turboFei turboFei changed the title [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive. [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive Jun 9, 2020
@turboFei
Copy link
Member Author

turboFei commented Jun 9, 2020

thanks, have added a blank line

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 9, 2020

Test build #123685 has finished for PR 28765 at commit d949065.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 717ec5e Jun 9, 2020
cloud-fan pushed a commit that referenced this pull request Jun 9, 2020
…ion path should be case insensitive

### What changes were proposed in this pull request?

This is a follow up of #25979.
When we inserting overwrite  an external hive partitioned table with upper case dynamic partition key, exception thrown.

like:
```
org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths.
```
The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in:

https://github.com/apache/spark/blob/ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L895-L901
https://github.com/apache/spark/blob/e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L228-L234

In this PR, we convert the dynamic partition map to a case insensitive map.
### Why are the changes needed?

To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
UT.

Closes #28765 from turboFei/SPARK-29295-follow-up.

Authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 717ec5e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants