[SPARK-42442][SQL] Use spark.sql.timestampType for data source inference #40022
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
With the configuration
spark.sql.timestampType, TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users.There is another option
spark.sql.sources.timestampNTZTypeInference.enabledfor schema inference. I would like to introduce it in #40005 but having two flags seems too much. After thoughts, I decide to mergespark.sql.sources.timestampNTZTypeInference.enabledintospark.sql.timestampTypeand letspark.sql.timestampTypecontrol the schema inference behavior.We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did.
Why are the changes needed?
Make the new feature simpler.
Does this PR introduce any user-facing change?
No, the feature is not released yet.
How was this patch tested?
Existing UT
I also tried
to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed.