Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase fuzz testing of streaming group by / low cardinality columns #12990

Merged
merged 2 commits into from
Oct 30, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 17, 2024

Which issue does this PR close?

Part of #12114

Rationale for this change

As @Rachelint points out #12847 (comment) the current coverage of sorting columns is limited to sorting on columns of distinct values which limits the number of sequential values in the data

What changes are included in this PR?

Adds two low cardinality columns and ensures we have at least one query in each fuzz run that is entirely ordered by the sort key.

Here is a example of the kind of queries that are tested:

 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16

Are these changes tested?

Only tests

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Oct 17, 2024
@alamb alamb force-pushed the alamb/aggregate_fuzz_streaming branch from 1a0491c to 3147e9e Compare October 22, 2024 16:01
@alamb alamb marked this pull request as ready for review October 23, 2024 23:11
Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb it looks good to me. Only one minor comment.
By the way, there is a conflict here.

@@ -417,8 +435,7 @@ impl QueryBuilder {
mut self,
group_by: impl IntoIterator<Item = &'a str>,
) -> Self {
let group_by = group_by.into_iter().map(String::from);
self.group_by_columns.extend(group_by);
self.group_by_columns = group_by.into_iter().map(String::from).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you changed the behavior from extending to replacing. However, the behaviors are still extending or pushing, e.g., with_distinct_aggregate_function or with_aggregate_arguments. I think all the behaviors are better to be aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point. Thank you @goldmedal I have renamed with_group_by_columns to set_group_by_colums to make it clear the behavior is different in 4f828d6

@alamb alamb merged commit 7d34ccc into apache:main Oct 30, 2024
24 checks passed
@alamb alamb deleted the alamb/aggregate_fuzz_streaming branch November 1, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants