Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize pruning utf8 fuzz test #13947

Merged
merged 1 commit into from
Jan 2, 2025
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 30, 2024

Which issue does this PR close?

Rationale for this change

  1. After Implement predicate pruning for like expressions (prefix matching) #12978 from @adriangb (❤️ ) the test_fuzz_utf8 test takes over a minute to run on my machine
  2. It is also not especially clear what it is doing or what is taking the time

What changes are included in this PR?

  1. Break the test into multiple individual test cases that each test a particular predicate
  2. The logic is still the same (no changes are intended)

New output: longest test now takes 12 seconds

Example:

andrewlamb@Mac:~/Software/datafusion$ cargo nextest run --test fuzz -- pruning
   Compiling datafusion v44.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/core)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 2.08s
------------
 Nextest run ID d214dd1d-952c-4280-a2b6-14f2d0109b79 with nextest profile: default
    Starting 12 tests across 1 binary (46 tests skipped)
        PASS [   1.778s] datafusion::fuzz fuzz_cases::pruning::test_utf8_eq
        PASS [   2.066s] datafusion::fuzz fuzz_cases::pruning::test_utf8_gt_eq
        PASS [   2.095s] datafusion::fuzz fuzz_cases::pruning::test_utf8_lt
        PASS [   2.153s] datafusion::fuzz fuzz_cases::pruning::test_utf8_gt
        PASS [   2.175s] datafusion::fuzz fuzz_cases::pruning::test_utf8_lt_eq
        PASS [   2.967s] datafusion::fuzz fuzz_cases::pruning::test_utf8_not_eq
        PASS [   6.688s] datafusion::fuzz fuzz_cases::pruning::test_utf8_like
        PASS [   7.283s] datafusion::fuzz fuzz_cases::pruning::test_utf8_not_like
        PASS [   9.060s] datafusion::fuzz fuzz_cases::pruning::test_utf8_like_suffix
        PASS [   9.389s] datafusion::fuzz fuzz_cases::pruning::test_utf8_like_prefix
        PASS [   9.912s] datafusion::fuzz fuzz_cases::pruning::test_utf8_not_like_suffix
        PASS [  12.013s] datafusion::fuzz fuzz_cases::pruning::test_utf8_not_like_prefix

Are these changes tested?

Yes, by CI

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Dec 30, 2024
.map(|c| c.join("")),
);
}
async fn test_utf8_eq() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the individual predicates are now run in their own test that shows up in the output

@alamb
Copy link
Contributor Author

alamb commented Dec 30, 2024

There may be additional improvements we could make to improve the overall speed of these tests but I focused on parallelization first

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! 12s is much better. I bet we can get that way down as well if we improve the speed of each test. Then we might even be able to crank up the search space that is being fuzzed 😄

@alamb
Copy link
Contributor Author

alamb commented Dec 30, 2024

Thanks for the review @adriangb ❤️

Looks great to me! 12s is much better. I bet we can get that way down as well if we improve the speed of each test. Then we might even be able to crank up the search space that is being fuzzed 😄

The other thing that maybe we could do is run these fuzzing tests on commits to main (rather than on each PR) in release mode 🤔

The challenge there is that we need someone to triage / monitor the tests and it is not lear we have anyone who will do so

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@alamb
Copy link
Contributor Author

alamb commented Jan 2, 2025

Thank you for the review @jonahgao and @adriangb ❤️

@alamb alamb merged commit 846adf3 into apache:main Jan 2, 2025
26 checks passed
@alamb alamb deleted the alamb/faster_test branch January 2, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve speed of datafusion::fuzz fuzz_cases::pruning::test_fuzz_utf8 test
3 participants