-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13070][SQL] Better error message when Parquet schema merging fails #10972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just use a foreach on footers and then combine all of these in a single pass? Seems simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. something like
var mergedSchema = StructType(Nil)
footers.foreach { file =>
val schema = ParquetRelation.readSchemaFromFooter(footer, converter)
try {
mergedSchema = mergedSchema.merge(schema)
} catch {
...
}
}
mergedSchema0ff9651 to
bfe4987
Compare
|
Test build #50331 has finished for PR 10972 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the schema from first footer and then go through this loop for remaining footers? Because you merge the first schema with an empty schema, I think the all fields in merged schema will be optional in their metadata. So the pushing down of filters will not normally work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you'll right, filter push-down can be affected due to #9940 (which I just merged today). Thanks for pointing this out!
|
Test build #50327 has finished for PR 10972 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we catch something tighter here? This is too broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i'd prefer to make this more concise, e.g.
throw new SparkException(
"Failed merging schema of file ${footer.getFile}: ${schema.treeString}", cause)
I don't know why these functions are super long.
|
@viirya do you want to submit a pull request to address your issue? @liancheng is busy with something right now. i.e. create a pull request that contains this one and set the 1st |
|
Test build #50354 has finished for PR 10972 at commit
|
|
@viirya It would be great if you can help since you are pretty familiar with this part of code :) |
|
I'm closing this one. |
|
@rxin @liancheng yes, I would love to do it. Thanks. |
Now we also report path and schema of the file in trouble.