Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Timestamp values returned by Kyuubi may lose precision #4316

Closed
2 of 4 tasks
pan3793 opened this issue Feb 13, 2023 · 1 comment
Closed
2 of 4 tasks

[Bug] Timestamp values returned by Kyuubi may lose precision #4316

pan3793 opened this issue Feb 13, 2023 · 1 comment
Labels
kind:bug This is a clearly a bug priority:major

Comments

@pan3793
Copy link
Member

pan3793 commented Feb 13, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Run the following SQL, Kyuubi returns different results w/ spark-sql and STS

SELECT to_timestamp('2023-02-08 22:17:33.123456');

Kyuubi: 2023-02-08 22:17:33.123
spark-sql and STS: 2023-02-08 22:17:33.123456

Affects Version(s)

master/1.7/1.6

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

All of Kyuubi, spark-sql and STS render the timestamp value on the Spark Driver side, but Kyuubi uses org.apache.kyuubi.engine.spark.schema#toHiveString to render the timestamp value as the String instead of org.apache.spark.sql.execution#toHiveString which is used by spark-sql and STS, this causes inconsistent result in some cases.

Note, org.apache.spark.sql.execution#toHiveString is not a public API, and SPARK-32006(fixed version 3.1.0) change the method signature. Currently, Kyuubi officially supports Spark 3.1 and above, life is simple if we don't consider Spark 3.0 here.

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@pan3793 pan3793 added kind:bug This is a clearly a bug priority:major labels Feb 13, 2023
@pan3793
Copy link
Member Author

pan3793 commented Feb 13, 2023

cc @cfmcgrady @cxzl25

@pan3793 pan3793 changed the title [Bug] [Bug] Timestamp values returned by Kyuubi may lose precision Feb 13, 2023
pan3793 added a commit that referenced this issue Feb 14, 2023
### _Why are the changes needed?_

This PR proposes to use `org.apache.spark.sql.execution#toHiveString` to replace `org.apache.kyuubi.engine.spark.schema#toHiveString` to get consistent result w/ `spark-sql` and `STS`.

Because of [SPARK-32006](https://issues.apache.org/jira/browse/SPARK-32006), it only works w/ Spark 3.1 and above.

The patch takes effects on both thrift and arrow result format.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate
```
➜  ~ beeline -u 'jdbc:hive2://0.0.0.0:10009/default'
Connecting to jdbc:hive2://0.0.0.0:10009/default
Connected to: Spark SQL (version 3.3.1)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.9 by Apache Hive
0: jdbc:hive2://0.0.0.0:10009/default> select to_timestamp('2023-02-08 22:17:33.123456789');
+----------------------------------------------+
| to_timestamp(2023-02-08 22:17:33.123456789)  |
+----------------------------------------------+
| 2023-02-08 22:17:33.123456                   |
+----------------------------------------------+
1 row selected (0.415 seconds)
```

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4318 from pan3793/hive-string.

Closes #4316

ba9016f [Cheng Pan] nit
8be774b [Cheng Pan] nit
bd696fe [Cheng Pan] nit
b5cf051 [Cheng Pan] fix
dd6b702 [Cheng Pan] test
63edd34 [Cheng Pan] nit
37cc70a [Cheng Pan] Fix python ut
c66ad22 [Cheng Pan] [KYUUBI #4316] Fix returned Timestamp values may lose precision
41d9444 [Cheng Pan] Revert "[KYUUBI #3958] Fix Spark session timezone format"

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 8fe7947)
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

1 participant