-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Spark session timezone format #3958
Conversation
set spark.sql.session.timeZone=UTC;
select from_utc_timestamp(from_unixtime(1670404535000/1000,'yyyy-MM-dd HH:mm:ss'),'GMT+08:00') as time_utc8;
set spark.sql.session.timeZone=GMT+8;
select from_utc_timestamp(from_unixtime(1670404535000/1000,'yyyy-MM-dd HH:mm:ss'),'GMT+08:00') as time_utc8; Current
Fix
|
6252a1f
to
e2fd90a
Compare
Codecov Report
@@ Coverage Diff @@
## master #3958 +/- ##
============================================
- Coverage 51.96% 51.94% -0.02%
Complexity 13 13
============================================
Files 522 522
Lines 28870 28876 +6
Branches 3864 3864
============================================
Hits 15001 15001
- Misses 12498 12500 +2
- Partials 1371 1375 +4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow encoding also has the same problem, we can fix it in a separate PR
### _Why are the changes needed?_ The Spark session supports setting the time zone through `spark.sql.session.timeZone` and formatting according to the time zone, but `timestamp` does not use timezone, resulting in some incorrect results. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #3958 from cxzl25/fix_session_timezone. Closes #3958 3f2e375 [sychen] ut e2fd90a [sychen] session timezone format Authored-by: sychen <sychen@ctrip.com> Signed-off-by: fwang12 <fwang12@ebay.com> (cherry picked from commit 4efd4d0) Signed-off-by: fwang12 <fwang12@ebay.com>
thanks, merged to master and branch-1.6 |
This reverts commit 4efd4d0.
### _Why are the changes needed?_ This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`. Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above. The patch takes effects on both thrift and arrow result format. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` ➜ ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default' Connecting to jdbc:hive2://0.0.0.0:10009/default Connected to: Spark SQL (version 3.3.1) Driver: Hive JDBC (version 2.3.9) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789'); +----------------------------------------------+ | to_timestamp(2023-02-08 22:17:33.123456789) | +----------------------------------------------+ | 2023-02-08 22:17:33.123456 | +----------------------------------------------+ 1 row selected (0.415 seconds) ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4318 from pan3793/hive-string. Closes #4316 ba9016f [Cheng Pan] nit 8be774b [Cheng Pan] nit bd696fe [Cheng Pan] nit b5cf051 [Cheng Pan] fix dd6b702 [Cheng Pan] test 63edd34 [Cheng Pan] nit 37cc70a [Cheng Pan] Fix python ut c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision 41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format" Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`. Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above. The patch takes effects on both thrift and arrow result format. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` ➜ ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default' Connecting to jdbc:hive2://0.0.0.0:10009/default Connected to: Spark SQL (version 3.3.1) Driver: Hive JDBC (version 2.3.9) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.9 by Apache Hive 0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789'); +----------------------------------------------+ | to_timestamp(2023-02-08 22:17:33.123456789) | +----------------------------------------------+ | 2023-02-08 22:17:33.123456 | +----------------------------------------------+ 1 row selected (0.415 seconds) ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4318 from pan3793/hive-string. Closes #4316 ba9016f [Cheng Pan] nit 8be774b [Cheng Pan] nit bd696fe [Cheng Pan] nit b5cf051 [Cheng Pan] fix dd6b702 [Cheng Pan] test 63edd34 [Cheng Pan] nit 37cc70a [Cheng Pan] Fix python ut c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision 41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format" Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 8fe7947) Signed-off-by: Cheng Pan <chengpan@apache.org>
…ed result format ### _Why are the changes needed?_ 1. this PR introduces a new configuration called `kyuubi.operation.result.arrow.timestampAsString`, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission. 2. `kyuubi.operation.result.arrow.timestampAsString` default setting to false for better transmission performance 3. the PR fixes timezone issue in arrow based result format described in #3958 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4326 from cfmcgrady/arrow-string-ts. Closes #4326 38c7fc9 [Fu Chen] fix style d864db0 [Fu Chen] address comment b714b3e [Fu Chen] revert externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala 6c4eb50 [Fu Chen] minor 289b600 [Fu Chen] timstampAsString = false by default 78b7cab [Fu Chen] fix f560135 [Fu Chen] debug info b8e4b28 [Fu Chen] fix ut 87c6f9e [Fu Chen] update docs 86f6cb7 [Fu Chen] arrow based rowset timestamp as string Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…ed result format ### _Why are the changes needed?_ 1. this PR introduces a new configuration called `kyuubi.operation.result.arrow.timestampAsString`, when true, arrow-based rowsets will convert timestamp-type columns to strings for transmission. 2. `kyuubi.operation.result.arrow.timestampAsString` default setting to false for better transmission performance 3. the PR fixes timezone issue in arrow based result format described in #3958 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4326 from cfmcgrady/arrow-string-ts. Closes #4326 38c7fc9 [Fu Chen] fix style d864db0 [Fu Chen] address comment b714b3e [Fu Chen] revert externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/schema/RowSet.scala 6c4eb50 [Fu Chen] minor 289b600 [Fu Chen] timstampAsString = false by default 78b7cab [Fu Chen] fix f560135 [Fu Chen] debug info b8e4b28 [Fu Chen] fix ut 87c6f9e [Fu Chen] update docs 86f6cb7 [Fu Chen] arrow based rowset timestamp as string Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 6bd0016) Signed-off-by: Cheng Pan <chengpan@apache.org>
Why are the changes needed?
The Spark session supports setting the time zone through
spark.sql.session.timeZone
and formatting according to the time zone, buttimestamp
does not use timezone, resulting in some incorrect results.How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request