Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JDBC] Fix precision returned for datetime columns #178

Closed
wants to merge 8 commits into from

Conversation

Yury-Fridlyand
Copy link

@Yury-Fridlyand Yury-Fridlyand commented Nov 29, 2022

Signed-off-by: Yury-Fridlyand yuryf@bitquilltech.com

Description

The proposed fix add calculation of Precision for datetime types in runtime for each dataset.
To optimize performance, analysis goes over first 1000 rows of dataset and stop earlier if precision calculated with enough precision - on 100 entries for good statistics.

Test

Before
image

After
image

Issues Resolved

opensearch-project/sql-jdbc#19

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@Yury-Fridlyand Yury-Fridlyand changed the title Fix precision returned for datetime columns [WIP] Fix precision returned for datetime columns Nov 29, 2022
@Yury-Fridlyand Yury-Fridlyand changed the title [WIP] Fix precision returned for datetime columns [WIP][JDBC] Fix precision returned for datetime columns Nov 30, 2022
@@ -98,6 +99,26 @@ public ResultSetImpl(StatementImpl statement, List<? extends ColumnDescriptor> c

List<Row> rows = getRowsFromDataRows(dataRows);

for (int i = 0; i < columnDescriptors.size(); i ++) {
if (schema.getOpenSearchType(i) == OpenSearchType.TIMESTAMP ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is an enum, its better practice to use a switch

@@ -98,6 +99,26 @@ public ResultSetImpl(StatementImpl statement, List<? extends ColumnDescriptor> c

List<Row> rows = getRowsFromDataRows(dataRows);

for (int i = 0; i < columnDescriptors.size(); i ++) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use array streams to filter the columns by type (DATE|TIME|DATETIME), and then get the max value row for that column.
Then you don't have to include a double for loop

Copy link

@acarbonetto acarbonetto Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check to see if the Precision is already defined in the metadata beforehand?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check to see if the Precision is already defined in the metadata beforehand?

No, I change the constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you don't have to include a double for loop

I have to loop through all rows/columns anyway

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we talk about this approach? I see what you're doing, but I have some thoughts about the pros/cons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure!

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@codecov
Copy link

codecov bot commented Dec 9, 2022

Codecov Report

❗ No coverage uploaded for pull request base (integ-fix-#1003@e2bf254). Click here to learn what that means.
The diff coverage is n/a.

@@                Coverage Diff                 @@
##             integ-fix-#1003     opensearch-project/sql#178   +/-   ##
==================================================
  Coverage                   ?   95.78%           
  Complexity                 ?     3465           
==================================================
  Files                      ?      357           
  Lines                      ?     9305           
  Branches                   ?      669           
==================================================
  Hits                       ?     8913           
  Misses                     ?      334           
  Partials                   ?       58           
Flag Coverage Δ
query-workbench 62.76% <0.00%> (?)
sql-engine 98.29% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@Yury-Fridlyand Yury-Fridlyand changed the title [WIP][JDBC] Fix precision returned for datetime columns [JDBC] Fix precision returned for datetime columns Dec 9, 2022
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review December 9, 2022 01:27
@MaxKsyunz
Copy link

@Yury-Fridlyand why is this the best solution? What are other options?

Seems strange that we cannot derive precision of a timestamp from OpenSearch type information.

@Yury-Fridlyand
Copy link
Author

Yury-Fridlyand commented Dec 9, 2022

@Yury-Fridlyand why is this the best solution? What are other options?

Seems strange that we cannot derive precision of a timestamp from OpenSearch type information.

I was waiting for this question!

other options?

Return fixed values, but more realistic.

from OpenSearch type information

There is no such information, unless we change the server side to provide it.
Please, see example:

opensearchsql> select sysdate(), sysdate(3), sysdate(6), typeof(sysdate()), typeof(sysdate(3));
fetched rows / total rows = 1/1
+---------------------+-------------------------+----------------------------+---------------------+----------------------+
| sysdate()           | sysdate(3)              | sysdate(6)                 | typeof(sysdate())   | typeof(sysdate(3))   |
|---------------------+-------------------------+----------------------------+---------------------+----------------------|
| 2022-12-08 19:11:16 | 2022-12-08 19:11:16.381 | 2022-12-08 19:11:16.381169 | DATETIME            | DATETIME             |
+---------------------+-------------------------+----------------------------+---------------------+----------------------+

First 3 columns have the same type, but different precision: 19, 23 and 26.


I dislike the solution I proposed, but I have no idea how to do this better. You are welcome to share your thoughts!

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
…Time` has no FSP.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@Yury-Fridlyand
Copy link
Author

Discussed offline.

  • we can't iterate through result set;
  • there is no API in the SQL plugin to ask for metadata;
  • metadata fields should represent maximum possible values for used data types.

Fix is is implemented in 6a495cb in scope of opensearch-project#185.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants