Skip to content

Commit bc30351

Browse files
Buddecloud-fan
authored andcommitted
[SPARK-19611][SQL] Preserve metastore field order when merging inferred schema
## What changes were proposed in this pull request? The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in #16944 may not preserve the same field order as the metastore schema in some cases, which can cause queries to fail. This change ensures that the metastore field order is preserved. ## How was this patch tested? A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.``` The particular failure usecase from #16944 was tested manually as well. Author: Budde <budde@amazon.com> Closes #17249 from budde/PreserveMetastoreFieldOrder.
1 parent 8f0490e commit bc30351

File tree

2 files changed

+22
-4
lines changed

2 files changed

+22
-4
lines changed

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -356,13 +356,10 @@ private[hive] object HiveMetastoreCatalog {
356356
.filterKeys(!inferredSchema.map(_.name.toLowerCase).contains(_))
357357
.values
358358
.filter(_.nullable)
359-
360359
// Merge missing nullable fields to inferred schema and build a case-insensitive field map.
361360
val inferredFields = StructType(inferredSchema ++ missingNullables)
362361
.map(f => f.name.toLowerCase -> f).toMap
363-
StructType(metastoreFields.map { case(name, field) =>
364-
field.copy(name = inferredFields(name).name)
365-
}.toSeq)
362+
StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
366363
} catch {
367364
case NonFatal(_) =>
368365
val msg = s"""Detected conflicting schemas when merging the schema obtained from the Hive

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,27 @@ class HiveSchemaInferenceSuite
293293
StructField("firstField", StringType, nullable = true),
294294
StructField("secondField", StringType, nullable = true))))
295295
}.getMessage.contains("Detected conflicting schemas"))
296+
297+
// Schema merge should maintain metastore order.
298+
assertResult(
299+
StructType(Seq(
300+
StructField("first_field", StringType, nullable = true),
301+
StructField("second_field", StringType, nullable = true),
302+
StructField("third_field", StringType, nullable = true),
303+
StructField("fourth_field", StringType, nullable = true),
304+
StructField("fifth_field", StringType, nullable = true)))) {
305+
HiveMetastoreCatalog.mergeWithMetastoreSchema(
306+
StructType(Seq(
307+
StructField("first_field", StringType, nullable = true),
308+
StructField("second_field", StringType, nullable = true),
309+
StructField("third_field", StringType, nullable = true),
310+
StructField("fourth_field", StringType, nullable = true),
311+
StructField("fifth_field", StringType, nullable = true))),
312+
StructType(Seq(
313+
StructField("fifth_field", StringType, nullable = true),
314+
StructField("third_field", StringType, nullable = true),
315+
StructField("second_field", StringType, nullable = true))))
316+
}
296317
}
297318
}
298319

0 commit comments

Comments
 (0)