[SPARK-20450] [SQL] Unexpected first-query schema inference cost with 2.1.1 #17749
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-19611 fixes a regression from 2.0 where Spark silently fails to read case-sensitive fields missing a case-sensitive schema in the table properties. The fix is to detect this situation, infer the schema, and write the case-sensitive schema into the metastore.
However this can incur an unexpected performance hit the first time such a problematic table is queried (and there is a high false-positive rate here since most tables don't actually have case-sensitive fields).
This PR changes the default to NEVER_INFER (same behavior as 2.1.0). In 2.2, we can consider leaving the default to INFER_AND_SAVE.
How was this patch tested?
Unit tests.