-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19359][SQL]clear useless path after rename a partition with upper-case by HiveExternalCatalog #16700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19359][SQL]clear useless path after rename a partition with upper-case by HiveExternalCatalog #16700
Changes from all commits
6a8efdd
878d45e
5403595
40efce2
12acdc6
7aba059
0136388
de4c409
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat | |
| spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v } | ||
| } | ||
|
|
||
|
|
||
| /** | ||
| * The partition path created by Hive is in lowercase, while Spark SQL will | ||
| * rename it with the partition name in partitionColumnNames, and this function | ||
| * returns the extra lowercase path created by Hive, and then we can delete it. | ||
| * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function returns | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same issue here. |
||
| * /path/a=4 | ||
| */ | ||
| def getExtraPartPathCreatedByHive( | ||
| spec: TablePartitionSpec, | ||
| partitionColumnNames: Seq[String], | ||
| tablePath: Path): Path = { | ||
| val partColumnNames = partitionColumnNames | ||
| .take(partitionColumnNames.indexWhere(col => col.toLowerCase != col) + 1) | ||
| .map(_.toLowerCase) | ||
|
|
||
| ExternalCatalogUtils.generatePartitionPath(lowerCasePartitionSpec(spec), | ||
| partColumnNames, tablePath) | ||
| } | ||
|
|
||
| override def createPartitions( | ||
| db: String, | ||
| table: String, | ||
|
|
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat | |
| spec, partitionColumnNames, tablePath) | ||
| try { | ||
| tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found an issue here... When we call |
||
|
|
||
| // If the newSpec contains more than one depth partition, FileSystem.rename just deletes | ||
| // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. | ||
| // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to use a |
||
| // is still exists, which we also need to delete | ||
| val delHivePartPathAfterRename = getExtraPartPathCreatedByHive( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, could it possibly have multiple specs sharing the same parent directory, e.g., 'A=1/B=2', 'A=1/B=3', ...? If so, when you delete the path 'a=1' here, in processing the next spec 'A=1/B=3', I think the rename will fail.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The path
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When you iterates specs and rename the directories with FileSystem.rename, in the first iteration,
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So far, the partition rename DDL we support is for a single pair of partition spec. That is, However, your concern looks reasonable. I think we should not support the partition renaming for multiple partitions in a single DDL in the SessionCatalog and ExternalCatalog. It just makes the code more complex for error handling. Let me remove it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this can be worse. If we already have a partition |
||
| spec, | ||
| partitionColumnNames, | ||
| tablePath) | ||
|
|
||
| if (delHivePartPathAfterRename != wrongPath) { | ||
| tablePath.getFileSystem(hadoopConf).delete(delHivePartPathAfterRename, true) | ||
| } | ||
| } catch { | ||
| case e: IOException => throw new SparkException( | ||
| s"Unable to rename partition path from $wrongPath to $rightPath", e) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: all of them are commas. You need to use periods. : )