[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive #28765

turboFei · 2020-06-09T07:30:25Z

What changes were proposed in this pull request?

This is a follow up of #25979.
When we inserting overwrite an external hive partitioned table with upper case dynamic partition key, exception thrown.

like:

org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths.

The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in:

spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

Lines 895 to 901 in ddd8d5f

    
           // Hive metastore is not case preserving and keeps partition columns with lower cased names, 
        
           // and Hive will validate the column names in partition spec to make sure they are partition 
        
           // columns. Here we Lowercase the column names before passing the partition spec to Hive 
        
           // client, to satisfy Hive. 
        
           // scalastyle:off caselocale 
        
           orderedPartitionSpec.put(colName.toLowerCase, partition(colName)) 
        
           // scalastyle:on caselocale

spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala

Lines 228 to 234 in e289140

    
           val updatedPartitionSpec = partition.map { 
        
             case (key, Some(value)) => key -> value 
        
             case (key, None) if dpMap.contains(key) => key -> dpMap(key) 
        
             case (key, _) => 
        
               throw new SparkException(s"Dynamic partition key $key is not among " + 
        
                 "written partition paths.") 
        
           }

In this PR, we convert the dynamic partition map to a case insensitive map.

Why are the changes needed?

To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

turboFei · 2020-06-09T07:35:38Z

cc @viirya @cloud-fan

…ion path should be case insensitive

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

turboFei · 2020-06-09T09:09:02Z

thanks, have added a blank line

cloud-fan · 2020-06-09T09:26:29Z

ok to test

SparkQA · 2020-06-09T12:16:21Z

Test build #123685 has finished for PR 28765 at commit d949065.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-06-09T15:57:15Z

thanks, merging to master/3.0!

…ion path should be case insensitive ### What changes were proposed in this pull request? This is a follow up of #25979. When we inserting overwrite an external hive partitioned table with upper case dynamic partition key, exception thrown. like: ``` org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths. ``` The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in: https://github.com/apache/spark/blob/ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L895-L901 https://github.com/apache/spark/blob/e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L228-L234 In this PR, we convert the dynamic partition map to a case insensitive map. ### Why are the changes needed? To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UT. Closes #28765 from turboFei/SPARK-29295-follow-up. Authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 717ec5e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

probot-autolabeler bot added the SQL label Jun 9, 2020

turboFei changed the title ~~[SPARK-29295][FOLLOWUP] Dynamic partition map should be case insensitive.~~ [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive. Jun 9, 2020

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partit…

e076f6a

…ion path should be case insensitive

turboFei force-pushed the SPARK-29295-follow-up branch from 418038b to e076f6a Compare June 9, 2020 07:54

turboFei changed the title ~~[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive.~~ [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive Jun 9, 2020

cloud-fan reviewed Jun 9, 2020

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Show resolved Hide resolved

cloud-fan approved these changes Jun 9, 2020

View reviewed changes

add blank line

d949065

viirya approved these changes Jun 9, 2020

View reviewed changes

cloud-fan closed this in 717ec5e Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive #28765

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive #28765

Uh oh!

turboFei commented Jun 9, 2020 •

edited

Loading

Uh oh!

turboFei commented Jun 9, 2020 •

edited

Loading

Uh oh!

Uh oh!

turboFei commented Jun 9, 2020

Uh oh!

cloud-fan commented Jun 9, 2020

Uh oh!

SparkQA commented Jun 9, 2020

Uh oh!

cloud-fan commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	// Hive metastore is not case preserving and keeps partition columns with lower cased names,
	// and Hive will validate the column names in partition spec to make sure they are partition
	// columns. Here we Lowercase the column names before passing the partition spec to Hive
	// client, to satisfy Hive.
	// scalastyle:off caselocale
	orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
	// scalastyle:on caselocale

	val updatedPartitionSpec = partition.map {
	case (key, Some(value)) => key -> value
	case (key, None) if dpMap.contains(key) => key -> dpMap(key)
	case (key, _) =>
	throw new SparkException(s"Dynamic partition key $key is not among " +
	"written partition paths.")
	}

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive #28765

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive #28765

Uh oh!

Conversation

turboFei commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

turboFei commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

turboFei commented Jun 9, 2020

Uh oh!

cloud-fan commented Jun 9, 2020

Uh oh!

SparkQA commented Jun 9, 2020

Uh oh!

cloud-fan commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

turboFei commented Jun 9, 2020 •

edited

Loading

turboFei commented Jun 9, 2020 •

edited

Loading