Skip to content

Conversation

@ganeshashree
Copy link

What changes were proposed in this pull request?

This PR enhances error reporting for corrupted view metadata by adding a detailed, user-friendly assertion message when there's a mismatch between the number of view query column names and the number of columns in the view schema.
Changes:
Enhanced the assertion in SessionCatalog.scala (fromCatalogTable method) to include:
The fully qualified view name
The actual number of view query column names vs schema columns
The list of view query column names
The list of view schema column names
Guidance that the metadata needs to be repaired
Added a unit test in SessionCatalogSuite.scala to verify the enhanced error message is displayed correctly when corrupted view metadata is detected.

Why are the changes needed?

Currently, when view metadata is corrupted (i.e., the number of view query column names doesn't match the schema length), the assertion fails with a generic "assertion failed" message that provides no context about:
Which view has the problem
What the actual vs expected values are
What columns are involved
How to fix the issue
This makes debugging production issues very difficult. The enhanced error message provides all the necessary information to quickly identify and repair the corrupted view metadata.

Does this PR introduce any user-facing change?

Yes. Users will now see a detailed error message instead of a generic assertion failure when encountering corrupted view metadata:

Before:
assertion failed

After:
assertion failed: Corrupted view metadata detected for view spark_catalog.db.view_name. The number of view query column names 2 does not match the number of columns in the view schema 3. View query column names: [id, name], View schema columns: [id, name, value]. This indicates corrupted view metadata that needs to be repaired.

How was this patch tested?

Added a new unit test corrupted view metadata: mismatch between viewQueryColumnNames and schema in SessionCatalogSuite.scala that:
Creates a view with intentionally corrupted metadata (2 query column names but 3 schema columns)
Verifies that looking up the view throws an AssertionError
Validates the error message contains all expected details
Existing tests continue to pass

Was this patch authored or co-authored using generative AI tooling?

No

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 84882fa Oct 30, 2025
Yicong-Huang pushed a commit to Yicong-Huang/spark that referenced this pull request Oct 30, 2025
…ata corruption

### What changes were proposed in this pull request?

This PR enhances error reporting for corrupted view metadata by adding a detailed, user-friendly assertion message when there's a mismatch between the number of view query column names and the number of columns in the view schema.
Changes:
Enhanced the assertion in SessionCatalog.scala (fromCatalogTable method) to include:
The fully qualified view name
The actual number of view query column names vs schema columns
The list of view query column names
The list of view schema column names
Guidance that the metadata needs to be repaired
Added a unit test in SessionCatalogSuite.scala to verify the enhanced error message is displayed correctly when corrupted view metadata is detected.

### Why are the changes needed?

Currently, when view metadata is corrupted (i.e., the number of view query column names doesn't match the schema length), the assertion fails with a generic "assertion failed" message that provides no context about:
Which view has the problem
What the actual vs expected values are
What columns are involved
How to fix the issue
This makes debugging production issues very difficult. The enhanced error message provides all the necessary information to quickly identify and repair the corrupted view metadata.

### Does this PR introduce _any_ user-facing change?

Yes. Users will now see a detailed error message instead of a generic assertion failure when encountering corrupted view metadata:

Before:
`assertion failed`

After:
`assertion failed: Corrupted view metadata detected for view spark_catalog.db.view_name.
The number of view query column names 2 does not match the number of columns in the view schema 3.
View query column names: [id, name], View schema columns: [id, name, value].
This indicates corrupted view metadata that needs to be repaired.`

### How was this patch tested?

Added a new unit test corrupted view metadata: mismatch between viewQueryColumnNames and schema in SessionCatalogSuite.scala that:
Creates a view with intentionally corrupted metadata (2 query column names but 3 schema columns)
Verifies that looking up the view throws an AssertionError
Validates the error message contains all expected details
Existing tests continue to pass

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52732 from ganeshashree/SPARK-54030.

Authored-by: Ganesha S <ganesha.s@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants