Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sqllogictests): port group by test to sqllogic #6088

Merged
merged 9 commits into from
Apr 24, 2023

Conversation

aprimadi
Copy link
Contributor

Which issue does this PR close?

Port group by tests from group_by.rs.

Work towards #4495

Rationale for this change

N/A

What changes are included in this PR?

N/A

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 21, 2023
@aprimadi aprimadi marked this pull request as draft April 21, 2023 14:04
@aprimadi aprimadi changed the title refactor(sqllogictests): wip refactor group by test to sqllogic refactor(sqllogictests): refactor group by test to sqllogic Apr 21, 2023
@aprimadi aprimadi changed the title refactor(sqllogictests): refactor group by test to sqllogic refactor(sqllogictests): port group by test to sqllogic Apr 21, 2023
@alamb
Copy link
Contributor

alamb commented Apr 21, 2023

Thank you @aprimadi -- we really appreciate the help

@aprimadi
Copy link
Contributor Author

Not sure if we should port group_by_dictionary test case @alamb.

It seems that if we port it into sqllogic, the test would be longer. While doing it programmatically seems to be very concise.

@aprimadi aprimadi marked this pull request as ready for review April 22, 2023 07:38
@aprimadi
Copy link
Contributor Author

This is probably enough for now, all the other tests use a one-time-use test data.

}

/// Create and populate time32_s, time32_ms, time64_us, time64_ns tables
fn register_time_test_tables(ctx: &SessionContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need rust code to do this setup anymore

You can create arbitrary Arrow types that have no corresponding SQL types with the arrow_cast function -- https://arrow.apache.org/datafusion/user-guide/sql/data_types.html

For example, here is one way to do it:
https://github.com/apache/arrow-datafusion/blob/7759d96e892e7c159136af35ebc0def49ed3b3b1/datafusion/core/tests/sqllogictests/test_files/timestamps.slt#L113-L129

So you can make a value of Time32(Seconds) with something like

select arrow_cast(5000::int, 'Time32(Second)');
+-------------+
| Int64(5000) |
+-------------+
| 01:23:20    |
+-------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thank you for the feedback @alamb. Didn't know that. I'm done for the day, will probably work more on this tomorrow if nothing comes up.

@aprimadi
Copy link
Contributor Author

@alamb not sure how to port group_by_date_trunc and group_by_dictionary without writing setup code. Do we want to migrate those also?

@jiangzhx
Copy link
Contributor

My idea is mainly based on these two test cases and what they cover.

  1. group_by_date_trunc based on the method name, is to test date_trunc. I think we can use an existing test data and move to sqllogictests
  2. group_by_dictionary based on the code, is testing DictionaryArray. In my opinion, it should not be migrated to sqllogictest.

@aprimadi
Copy link
Contributor Author

I think both group_by_date_trunc and group_by_dictionary test should not be moved.

For group_by_date_trunc the data is procedurally generated and it also generates multiple partition file.

For group_by_dictionary, I totally agree with you @jiangzhx

IMO, the data setup should be as close to the code that test it except if it's being used multiple times.

@aprimadi
Copy link
Contributor Author

Btw,, those are the only two tests from group_by.rs not being moved to sqllogic.

@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

For group_by_date_trunc the data is procedurally generated and it also generates multiple partition file.

For group_by_dictionary, I totally agree with you @jiangzhx

I agree -- btw I am hoping eventually to be able to create partitioned files using COPY TO ... #5654

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me . Thank you @aprimadi

It looks like we can eventually consolidate group_by and aggregate.slt 🤔

https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/aggregate.slt

@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

I really appreciate the help @aprimadi ❤️

@alamb alamb merged commit 0576236 into apache:main Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants