Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

With the configuration spark.sql.timestampType, TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users.

There is another option spark.sql.sources.timestampNTZTypeInference.enabled for schema inference. I would like to introduce it in #40005 but having two flags seems too much. After thoughts, I decide to merge spark.sql.sources.timestampNTZTypeInference.enabled into spark.sql.timestampType and let spark.sql.timestampType control the schema inference behavior.

We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did.

Why are the changes needed?

Make the new feature simpler.

Does this PR introduce any user-facing change?

No, the feature is not released yet.

How was this patch tested?

Existing UT
I also tried

git grep spark.sql.sources.timestampNTZTypeInference.enabled
git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES

to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed.

@github-actions github-actions bot added the SQL label Feb 14, 2023
@gengliangwang
Copy link
Member Author

Sorry about getting back and forth about the schema inference. I am trying to tell a good story about the timestamp without time zone. This should be the final version before 3.4.0 release.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.4.

HyukjinKwon pushed a commit that referenced this pull request Feb 15, 2023
With the configuration `spark.sql.timestampType`,  TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users.

There is another option `spark.sql.sources.timestampNTZTypeInference.enabled` for schema inference. I would like to introduce it in #40005 but having two flags seems too much. After thoughts, I decide to merge `spark.sql.sources.timestampNTZTypeInference.enabled` into `spark.sql.timestampType` and let  `spark.sql.timestampType` control the schema inference behavior.

We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did.

Make the new feature simpler.

No, the feature is not released yet.

Existing UT
I also tried
```
git grep spark.sql.sources.timestampNTZTypeInference.enabled
git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES
```
to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed.

Closes #40022 from gengliangwang/unifyInference.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 46226c2)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
With the configuration `spark.sql.timestampType`,  TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users.

There is another option `spark.sql.sources.timestampNTZTypeInference.enabled` for schema inference. I would like to introduce it in apache#40005 but having two flags seems too much. After thoughts, I decide to merge `spark.sql.sources.timestampNTZTypeInference.enabled` into `spark.sql.timestampType` and let  `spark.sql.timestampType` control the schema inference behavior.

We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did.

Make the new feature simpler.

No, the feature is not released yet.

Existing UT
I also tried
```
git grep spark.sql.sources.timestampNTZTypeInference.enabled
git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES
```
to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed.

Closes apache#40022 from gengliangwang/unifyInference.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 46226c2)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants