Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sql level test for decimal data type #2200

Merged
merged 5 commits into from
Apr 14, 2022

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Apr 11, 2022

Which issue does this PR close?

extract and add test for decimal data type in the SQL level.

related #122

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 11, 2022
@liukun4515 liukun4515 force-pushed the decimal_test_in_sql_level branch from 854dd80 to 4e0ac61 Compare April 13, 2022 06:17
@liukun4515 liukun4515 marked this pull request as ready for review April 13, 2022 13:41
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job❤

It cover

  • lt/lteq/... six compare operator
  • min/max
  • group
  • sort

0.00005,0.000000000005,4,true,0.000078
0.00005,0.000000000005,8,false,0.000033
0.00005,0.000000000005,100,true,0.000068
0.00005,0.000000000005,1,false,0.000100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 I often find there's not new line for PR because of neglection.

I think I can add a editor config for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 -- looks great to me. Thank you @liukun4515

register_decimal_csv_table_by_sql(&ctx).await;
// min
let sql = "select min(c1) from decimal_simple where c4=false";
let actual = execute_to_batches(&ctx, sql).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone else following along, decimal_group_function covers grouping by decimal columns 👍

Something like

select max(c1) from decimal_simple group by c2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to cover the agg functions which support the decimal data type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if other agg functions support the decimal data type, we can add the test in the function or scope.

let sql = "select * from decimal_simple where c1=0.00002";
let actual = execute_to_batches(&ctx, sql).await;
assert_eq!(
&DataType::Decimal(10, 6),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this test only checks the schema of c1 and not c2?

It also isn't clear to me why this test is checkout the output schema but decimal_by_filter isn't (not that this is a problem I am just curious)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your point.
I will add the checker for the output schema in all test cases.

Copy link
Contributor Author

@liukun4515 liukun4515 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this test only checks the schema of c1 and not c2?

this test just wants to verify the decimal data type, I will add the test for c5.

There is a test about c5 let sql = "select * from decimal_simple where c1 > c5";

filter is about:

  1. filter by scalar
  2. filter by array/column. The c5 is used to do that.

];
assert_batches_eq!(expected, &actual);

// logic operation: gteq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth checking somewhere what happens with different decimal scales?

Something like

select * from decimal_simple where c1 >= CAST(0.00002 as Decimal(10,8))

I am not sure if the coercion to an appropriate decimal type is covered elsewhere

Copy link
Contributor Author

@liukun4515 liukun4515 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
c1 is decimal(10,6), and the cast(xxx as decimal(10,8)) is decimal(10,8), the result type should be the decimal(12,8).
In the filter expr, we should get the decimal(12,8).
I will add this test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the bug in the precision and scale for logic comparison operation.

❯ explain select * from food where a >= CAST(0.00002 as Decimal(10,8));
+---------------+----------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                     |
+---------------+----------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #food.a, #food.b, #food.c                                                                    |
|               |   Filter: #food.a >= Decimal128(Some(2000),10,8)                                                         |
|               |     TableScan: food projection=Some([0, 1, 2]), partial_filters=[#food.a >= Decimal128(Some(2000),10,8)] |
| physical_plan | ProjectionExec: expr=[a@0 as a, b@1 as b, c@2 as c]                                                      |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                            |
|               |     FilterExec: CAST(a@0 AS Decimal(10, 8)) >= Some(2000),10,8                                           |
|               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                                  |
|               |         CsvExec: files=[aggregate_simple.csv], has_header=false, limit=None, projection=[a, b, c]        |
|               |                                                                                                          |
+---------------+----------------------------------------------------------------------------------------------------------+

@alamb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the bug in the precision and scale for logic comparison operation.

Testing for the win!

];
assert_batches_eq!(expected, &actual);

let sql = "select * from decimal_simple where c1 < 0.00003 order by c1 desc,c4";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan Dandandan merged commit 8d5bb47 into apache:master Apr 14, 2022
@liukun4515 liukun4515 deleted the decimal_test_in_sql_level branch April 14, 2022 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants