Fix Spark session timezone format #3958

cxzl25 · 2022-12-10T05:03:08Z

Why are the changes needed?

The Spark session supports setting the time zone through spark.sql.session.timeZone and formatting according to the time zone, but timestamp does not use timezone, resulting in some incorrect results.

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request

cxzl25 · 2022-12-10T05:05:32Z

set spark.sql.session.timeZone=UTC;
select from_utc_timestamp(from_unixtime(1670404535000/1000,'yyyy-MM-dd HH:mm:ss'),'GMT+08:00') as time_utc8;
set spark.sql.session.timeZone=GMT+8;
select from_utc_timestamp(from_unixtime(1670404535000/1000,'yyyy-MM-dd HH:mm:ss'),'GMT+08:00') as time_utc8;

Current

2022-12-08 01:15:35.0
2022-12-08 01:15:35.0

Fix

2022-12-07 17:15:35.0
2022-12-08 01:15:35.0

codecov-commenter · 2022-12-10T06:12:31Z

Codecov Report

Merging #3958 (3f2e375) into master (7aa3445) will decrease coverage by 0.01%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##             master    #3958      +/-   ##
============================================
- Coverage     51.96%   51.94%   -0.02%     
  Complexity       13       13              
============================================
  Files           522      522              
  Lines         28870    28876       +6     
  Branches       3864     3864              
============================================
  Hits          15001    15001              
- Misses        12498    12500       +2     
- Partials       1371     1375       +4

Impacted Files	Coverage Δ
...ain/scala/org/apache/kyuubi/util/RowSetUtils.scala	`83.33% <85.71%> (ø)`
...org/apache/kyuubi/engine/spark/schema/RowSet.scala	`91.17% <100.00%> (ø)`
...g/apache/kyuubi/operation/BatchJobSubmission.scala	`78.43% <0.00%> (-1.31%)`	⬇️
.../engine/spark/session/SparkSQLSessionManager.scala	`78.48% <0.00%> (-1.27%)`	⬇️
.../org/apache/kyuubi/session/KyuubiSessionImpl.scala	`77.16% <0.00%> (-0.79%)`	⬇️
...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala	`80.74% <0.00%> (-0.63%)`	⬇️
...he/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala	`68.50% <0.00%> (-0.56%)`	⬇️
...org/apache/kyuubi/operation/ExecuteStatement.scala	`76.25% <0.00%> (ø)`
...che/kyuubi/server/KyuubiTHttpFrontendService.scala	`60.68% <0.00%> (+0.68%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

cfmcgrady

Arrow encoding also has the same problem, we can fix it in a separate PR

### _Why are the changes needed?_ The Spark session supports setting the time zone through `spark.sql.session.timeZone` and formatting according to the time zone, but `timestamp` does not use timezone, resulting in some incorrect results. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #3958 from cxzl25/fix_session_timezone. Closes #3958 3f2e375 [sychen] ut e2fd90a [sychen] session timezone format Authored-by: sychen <sychen@ctrip.com> Signed-off-by: fwang12 <fwang12@ebay.com> (cherry picked from commit 4efd4d0) Signed-off-by: fwang12 <fwang12@ebay.com>

turboFei · 2022-12-11T06:18:25Z

thanks, merged to master and branch-1.6

This reverts commit 4efd4d0.

### _Why are the changes needed?_ This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`. Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above. The patch takes effects on both thrift and arrow result format. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` ➜ ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default' Connecting to jdbc:hive2://0.0.0.0:10009/default Connected to: Spark SQL (version 3.3.1) Driver: Hive JDBC (version 2.3.9) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789'); +----------------------------------------------+ | to_timestamp(2023-02-08 22:17:33.123456789) | +----------------------------------------------+ | 2023-02-08 22:17:33.123456 | +----------------------------------------------+ 1 row selected (0.415 seconds) ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4318 from pan3793/hive-string. Closes #4316 ba9016f [Cheng Pan] nit 8be774b [Cheng Pan] nit bd696fe [Cheng Pan] nit b5cf051 [Cheng Pan] fix dd6b702 [Cheng Pan] test 63edd34 [Cheng Pan] nit 37cc70a [Cheng Pan] Fix python ut c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision 41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format" Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>

### _Why are the changes needed?_ This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`. Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above. The patch takes effects on both thrift and arrow result format. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` ➜ ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default' Connecting to jdbc:hive2://0.0.0.0:10009/default Connected to: Spark SQL (version 3.3.1) Driver: Hive JDBC (version 2.3.9) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789'); +----------------------------------------------+ | to_timestamp(2023-02-08 22:17:33.123456789) | +----------------------------------------------+ | 2023-02-08 22:17:33.123456 | +----------------------------------------------+ 1 row selected (0.415 seconds) ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4318 from pan3793/hive-string. Closes #4316 ba9016f [Cheng Pan] nit 8be774b [Cheng Pan] nit bd696fe [Cheng Pan] nit b5cf051 [Cheng Pan] fix dd6b702 [Cheng Pan] test 63edd34 [Cheng Pan] nit 37cc70a [Cheng Pan] Fix python ut c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision 41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format" Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 8fe7947) Signed-off-by: Cheng Pan <chengpan@apache.org>

…ed result format ### _Why are the changes needed?_ 1. this PR introduces a new configuration called `kyuubi.operation.result.arrow.timestampAsString`, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission. 2. `kyuubi.operation.result.arrow.timestampAsString` default setting to false for better transmission performance 3. the PR fixes timezone issue in arrow based result format described in #3958 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4326 from cfmcgrady/arrow-string-ts. Closes #4326 38c7fc9 [Fu Chen] fix style d864db0 [Fu Chen] address comment b714b3e [Fu Chen] revert externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala 6c4eb50 [Fu Chen] minor 289b600 [Fu Chen] timstampAsString = false by default 78b7cab [Fu Chen] fix f560135 [Fu Chen] debug info b8e4b28 [Fu Chen] fix ut 87c6f9e [Fu Chen] update docs 86f6cb7 [Fu Chen] arrow based rowset timestamp as string Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>

…ed result format ### _Why are the changes needed?_ 1. this PR introduces a new configuration called `kyuubi.operation.result.arrow.timestampAsString`, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission. 2. `kyuubi.operation.result.arrow.timestampAsString` default setting to false for better transmission performance 3. the PR fixes timezone issue in arrow based result format described in #3958 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4326 from cfmcgrady/arrow-string-ts. Closes #4326 38c7fc9 [Fu Chen] fix style d864db0 [Fu Chen] address comment b714b3e [Fu Chen] revert externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala 6c4eb50 [Fu Chen] minor 289b600 [Fu Chen] timstampAsString = false by default 78b7cab [Fu Chen] fix f560135 [Fu Chen] debug info b8e4b28 [Fu Chen] fix ut 87c6f9e [Fu Chen] update docs 86f6cb7 [Fu Chen] arrow based rowset timestamp as string Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 6bd0016) Signed-off-by: Cheng Pan <chengpan@apache.org>

github-actions bot added module:common module:spark labels Dec 10, 2022

session timezone format

e2fd90a

cxzl25 force-pushed the fix_session_timezone branch from 6252a1f to e2fd90a Compare December 10, 2022 05:13

ut

3f2e375

cxzl25 self-assigned this Dec 10, 2022

github-actions bot added the module:server label Dec 10, 2022

cxzl25 modified the milestones: v1.7.0, v1.6.2 Dec 10, 2022

turboFei approved these changes Dec 10, 2022

View reviewed changes

cfmcgrady approved these changes Dec 10, 2022

View reviewed changes

turboFei closed this in 4efd4d0 Dec 11, 2022

pan3793 added a commit to pan3793/kyuubi that referenced this pull request Feb 13, 2023

Revert "[KYUUBI apache#3958] Fix Spark session timezone format"

41d9444

This reverts commit 4efd4d0.

cfmcgrady mentioned this pull request Feb 17, 2023

[ARROW] Fix Spark session timezone format in arrow-based result format #4326

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Spark session timezone format #3958

Fix Spark session timezone format #3958

cxzl25 commented Dec 10, 2022 •

edited

Loading

cxzl25 commented Dec 10, 2022

codecov-commenter commented Dec 10, 2022 •

edited

Loading

cfmcgrady left a comment

turboFei commented Dec 11, 2022

Fix Spark session timezone format #3958

Fix Spark session timezone format #3958

Conversation

cxzl25 commented Dec 10, 2022 • edited Loading

Why are the changes needed?

How was this patch tested?

cxzl25 commented Dec 10, 2022

Current

Fix

codecov-commenter commented Dec 10, 2022 • edited Loading

Codecov Report

cfmcgrady left a comment

Choose a reason for hiding this comment

turboFei commented Dec 11, 2022

cxzl25 commented Dec 10, 2022 •

edited

Loading

codecov-commenter commented Dec 10, 2022 •

edited

Loading