-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table #28511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ff71732
[SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when th…
yaooqinn e4bfc7b
add benchmark & check only version
yaooqinn 85b27d6
nit
yaooqinn bef4259
Merge branch 'master' into SPARK-31684
yaooqinn 7a33929
Merge branch 'master' into SPARK-31684
yaooqinn f7c6b51
suffix
yaooqinn 1145632
make hiveversion comparable
yaooqinn af4bfa9
address comments
yaooqinn ddeb3e2
address comments
yaooqinn eea99f5
naming
yaooqinn 78e0972
rm distribute by
yaooqinn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11 changes: 11 additions & 0 deletions
11
sql/hive/benchmarks/InsertIntoHiveTableBenchmark-hive1.2-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 | ||
| Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz | ||
| insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| INSERT INTO DYNAMIC 6812 7043 328 0.0 665204.8 1.0X | ||
| INSERT INTO HYBRID 817 852 32 0.0 79783.6 8.3X | ||
| INSERT INTO STATIC 231 246 21 0.0 22568.2 29.5X | ||
| INSERT OVERWRITE DYNAMIC 25947 26671 1024 0.0 2533910.2 0.3X | ||
| INSERT OVERWRITE HYBRID 2846 2884 54 0.0 277908.7 2.4X | ||
| INSERT OVERWRITE STATIC 232 247 26 0.0 22659.9 29.4X | ||
|
|
11 changes: 11 additions & 0 deletions
11
sql/hive/benchmarks/InsertIntoHiveTableBenchmark-hive2.3-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 | ||
| Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz | ||
| insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| INSERT INTO DYNAMIC 4326 4373 66 0.0 422486.0 1.0X | ||
| INSERT INTO HYBRID 726 741 21 0.0 70877.2 6.0X | ||
| INSERT INTO STATIC 256 270 12 0.0 25015.7 16.9X | ||
| INSERT OVERWRITE DYNAMIC 4115 4150 49 0.0 401828.8 1.1X | ||
| INSERT OVERWRITE HYBRID 690 699 8 0.0 67370.5 6.3X | ||
| INSERT OVERWRITE STATIC 277 283 5 0.0 27097.9 15.6X | ||
|
|
11 changes: 11 additions & 0 deletions
11
sql/hive/benchmarks/InsertIntoHiveTableBenchmark-jdk11-hive2.3-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.4 | ||
| Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz | ||
| insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| INSERT INTO DYNAMIC 5083 5412 466 0.0 496384.5 1.0X | ||
| INSERT INTO HYBRID 822 864 43 0.0 80283.6 6.2X | ||
| INSERT INTO STATIC 335 342 5 0.0 32694.1 15.2X | ||
| INSERT OVERWRITE DYNAMIC 4941 5068 179 0.0 482534.5 1.0X | ||
| INSERT OVERWRITE HYBRID 722 745 27 0.0 70502.7 7.0X | ||
| INSERT OVERWRITE STATIC 295 314 12 0.0 28846.8 17.2X | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
140 changes: 140 additions & 0 deletions
140
...rc/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.execution.benchmark | ||
|
|
||
| import org.apache.spark.benchmark.Benchmark | ||
| import org.apache.spark.sql.SparkSession | ||
| import org.apache.spark.sql.hive.HiveUtils | ||
| import org.apache.spark.sql.hive.test.TestHive | ||
|
|
||
| /** | ||
| * Benchmark to measure hive table write performance. | ||
| * To run this benchmark: | ||
| * {{{ | ||
| * 1. without sbt: bin/spark-submit --class <this class> | ||
| * --jars <spark catalyst test jar>,<spark core test jar>,<spark hive jar> | ||
| * --packages org.spark-project.hive:hive-exec:1.2.1.spark2 | ||
| * <spark hive test jar> | ||
| * 2. build/sbt "hive/test:runMain <this class>" -Phive-1.2 or | ||
| * build/sbt "hive/test:runMain <this class>" -Phive-2.3 | ||
| * 3. generate result: | ||
| * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain <this class>" | ||
| * Results will be written to "benchmarks/InsertIntoHiveTableBenchmark-hive2.3-results.txt". | ||
| * 4. -Phive-1.2 does not work for JDK 11 | ||
| * }}} | ||
| */ | ||
| object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark { | ||
|
|
||
| override def getSparkSession: SparkSession = TestHive.sparkSession | ||
|
|
||
| val tempView = "temp" | ||
| val numRows = 1024 * 10 | ||
| val sql = spark.sql _ | ||
|
|
||
| // scalastyle:off hadoopconfiguration | ||
| private val hadoopConf = spark.sparkContext.hadoopConfiguration | ||
| // scalastyle:on hadoopconfiguration | ||
| hadoopConf.set("hive.exec.dynamic.partition", "true") | ||
| hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict") | ||
| hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString) | ||
|
|
||
| def withTable(tableNames: String*)(f: => Unit): Unit = { | ||
| tableNames.foreach { name => | ||
| sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b INT, c INT)") | ||
| } | ||
| try f finally { | ||
| tableNames.foreach { name => | ||
| spark.sql(s"DROP TABLE IF EXISTS $name") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ => | ||
| sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," + | ||
| s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT OVERWRITE HYBRID") { _ => | ||
| sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + | ||
| s" CAST(id % 10 AS INT) AS c FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT OVERWRITE STATIC") { _ => | ||
| sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + | ||
| s" FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT INTO DYNAMIC") { _ => | ||
| sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," + | ||
| s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT INTO HYBRID") { _ => | ||
| sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + | ||
| s" CAST(id % 10 AS INT) AS c FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| def insertIntoStatic(table: String, benchmark: Benchmark): Unit = { | ||
| benchmark.addCase("INSERT INTO STATIC") { _ => | ||
| sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + | ||
| s" FROM $tempView") | ||
| } | ||
| } | ||
|
|
||
| override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { | ||
| spark.range(numRows).createOrReplaceTempView(tempView) | ||
|
|
||
| try { | ||
| val t1 = "t1" | ||
| val t2 = "t2" | ||
| val t3 = "t3" | ||
| val t4 = "t4" | ||
| val t5 = "t5" | ||
| val t6 = "t6" | ||
|
|
||
| val benchmark = new Benchmark(s"insert hive table benchmark", numRows, output = output) | ||
|
|
||
| withTable(t1, t2, t3, t4, t5, t6) { | ||
|
|
||
| insertIntoDynamic(t1, benchmark) | ||
| insertIntoHybrid(t2, benchmark) | ||
| insertIntoStatic(t3, benchmark) | ||
|
|
||
| insertOverwriteDynamic(t4, benchmark) | ||
| insertOverwriteHybrid(t5, benchmark) | ||
| insertOverwriteStatic(t6, benchmark) | ||
|
|
||
| benchmark.run() | ||
| } | ||
| } finally { | ||
| spark.catalog.dropTempView(tempView) | ||
| } | ||
| } | ||
|
|
||
| override def suffix: String = if (HiveUtils.isHive23) "-hive2.3" else "-hive1.2" | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.