Skip to content

Commit 84882fa

Browse files
ganeshas-dbcloud-fan
authored andcommitted
[SPARK-54030][SQL] Add user-friendly assertion message for view metadata corruption
### What changes were proposed in this pull request? This PR enhances error reporting for corrupted view metadata by adding a detailed, user-friendly assertion message when there's a mismatch between the number of view query column names and the number of columns in the view schema. Changes: Enhanced the assertion in SessionCatalog.scala (fromCatalogTable method) to include: The fully qualified view name The actual number of view query column names vs schema columns The list of view query column names The list of view schema column names Guidance that the metadata needs to be repaired Added a unit test in SessionCatalogSuite.scala to verify the enhanced error message is displayed correctly when corrupted view metadata is detected. ### Why are the changes needed? Currently, when view metadata is corrupted (i.e., the number of view query column names doesn't match the schema length), the assertion fails with a generic "assertion failed" message that provides no context about: Which view has the problem What the actual vs expected values are What columns are involved How to fix the issue This makes debugging production issues very difficult. The enhanced error message provides all the necessary information to quickly identify and repair the corrupted view metadata. ### Does this PR introduce _any_ user-facing change? Yes. Users will now see a detailed error message instead of a generic assertion failure when encountering corrupted view metadata: Before: `assertion failed` After: `assertion failed: Corrupted view metadata detected for view spark_catalog.db.view_name. The number of view query column names 2 does not match the number of columns in the view schema 3. View query column names: [id, name], View schema columns: [id, name, value]. This indicates corrupted view metadata that needs to be repaired.` ### How was this patch tested? Added a new unit test corrupted view metadata: mismatch between viewQueryColumnNames and schema in SessionCatalogSuite.scala that: Creates a view with intentionally corrupted metadata (2 query column names but 3 schema columns) Verifies that looking up the view throws an AssertionError Validates the error message contains all expected details Existing tests continue to pass ### Was this patch authored or co-authored using generative AI tooling? No Closes #52732 from ganeshashree/SPARK-54030. Authored-by: Ganesha S <ganesha.s@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 52103e8 commit 84882fa

File tree

2 files changed

+75
-1
lines changed

2 files changed

+75
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1031,7 +1031,16 @@ class SessionCatalog(
10311031
// output is the same with the view output.
10321032
metadata.schema.fieldNames.toImmutableArraySeq
10331033
} else {
1034-
assert(metadata.viewQueryColumnNames.length == metadata.schema.length)
1034+
assert(metadata.viewQueryColumnNames.length == metadata.schema.length,
1035+
"Corrupted view metadata detected for view " +
1036+
metadata.identifier.quotedString + ". " +
1037+
"The number of view query column names " +
1038+
metadata.viewQueryColumnNames.length + " " +
1039+
"does not match the number of columns in the view schema " +
1040+
metadata.schema.length + ". " +
1041+
"View query column names: [" + metadata.viewQueryColumnNames.mkString(", ") + "], " +
1042+
"View schema columns: [" + metadata.schema.fieldNames.mkString(", ") + "]. " +
1043+
"This indicates corrupted view metadata that needs to be repaired.")
10351044
metadata.viewQueryColumnNames
10361045
}
10371046

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,4 +2026,69 @@ abstract class SessionCatalogSuite extends AnalysisTest with Eventually {
20262026
assert(partitionSchema.fieldNames.toSeq == Seq("year", "month"))
20272027
assert(partitionSchema.fields.length == 2)
20282028
}
2029+
2030+
test("corrupted view metadata: mismatch between viewQueryColumnNames and schema") {
2031+
withSQLConf("spark.sql.viewSchemaBinding.enabled" -> "true") {
2032+
val catalog = new SessionCatalog(newBasicCatalog())
2033+
val db = "test_db"
2034+
catalog.createDatabase(newDb(db), ignoreIfExists = false)
2035+
2036+
// First create a base table for the view to reference
2037+
val baseTable = CatalogTable(
2038+
identifier = TableIdentifier("base_table", Some(db)),
2039+
tableType = CatalogTableType.MANAGED,
2040+
storage = CatalogStorageFormat.empty,
2041+
schema = new StructType()
2042+
.add("id", IntegerType)
2043+
.add("name", StringType)
2044+
.add("value", DoubleType)
2045+
)
2046+
catalog.createTable(baseTable, ignoreIfExists = false)
2047+
2048+
// Create a view with corrupted metadata where viewQueryColumnNames length
2049+
// doesn't match schema length
2050+
// We need to set the properties to define viewQueryColumnNames
2051+
val properties = Map(
2052+
"view.query.out.numCols" -> "2",
2053+
"view.query.out.col.0" -> "id",
2054+
"view.query.out.col.1" -> "name",
2055+
"view.schema.mode" -> "binding" // Ensure it's not SchemaEvolution
2056+
)
2057+
val corruptedView = CatalogTable(
2058+
identifier = TableIdentifier("corrupted_view", Some(db)),
2059+
tableType = CatalogTableType.VIEW,
2060+
storage = CatalogStorageFormat.empty,
2061+
schema = new StructType()
2062+
.add("id", IntegerType)
2063+
.add("name", StringType)
2064+
.add("value", DoubleType),
2065+
viewText = Some("SELECT * FROM test_db.base_table"),
2066+
provider = Some("spark"), // Ensure it's not Hive-created
2067+
properties = properties // Only 2 query column names but schema has 3 columns
2068+
)
2069+
2070+
catalog.createTable(corruptedView, ignoreIfExists = false)
2071+
2072+
// Verify the view was created with corrupted metadata
2073+
val retrievedView = catalog.getTableMetadata(TableIdentifier("corrupted_view", Some(db)))
2074+
assert(retrievedView.viewQueryColumnNames.length == 2)
2075+
assert(retrievedView.schema.length == 3)
2076+
2077+
// Attempting to look up the view should throw an assertion error with detailed message
2078+
val exception = intercept[AssertionError] {
2079+
catalog.lookupRelation(TableIdentifier("corrupted_view", Some(db)))
2080+
}
2081+
2082+
// The expected message pattern allows for optional catalog prefix
2083+
val expectedPattern =
2084+
"assertion failed: Corrupted view metadata detected for view " +
2085+
"(\\`\\w+\\`\\.)?\\`test_db\\`\\.\\`corrupted_view\\`\\. " +
2086+
"The number of view query column names 2 " +
2087+
"does not match the number of columns in the view schema 3\\. " +
2088+
"View query column names: \\[id, name\\], " +
2089+
"View schema columns: \\[id, name, value\\]\\. " +
2090+
"This indicates corrupted view metadata that needs to be repaired\\."
2091+
assert(exception.getMessage.matches(expectedPattern))
2092+
}
2093+
}
20292094
}

0 commit comments

Comments
 (0)