Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Merge similar functions ilike_scalar and nilike_scalar #3303

Merged
merged 2 commits into from
Dec 9, 2022
Merged

refactor: Merge similar functions ilike_scalar and nilike_scalar #3303

merged 2 commits into from
Dec 9, 2022

Conversation

askoa
Copy link
Contributor

@askoa askoa commented Dec 8, 2022

Which issue does this PR close?

Part of #3296

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 8, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth just double checking the benchmarks for LLVM derp, but I'd be very surprised if this has any impact.

Thank you 😀

@askoa
Copy link
Contributor Author

askoa commented Dec 8, 2022

Might be worth just double checking the benchmarks for LLVM derp, but I'd be very surprised if this has any impact.

Do benchmarks run in PR checks? Or should I run it locally?

@tustvold
Copy link
Contributor

tustvold commented Dec 8, 2022

Locally, I'm also happy to run them tomorrow morning

@askoa
Copy link
Contributor Author

askoa commented Dec 8, 2022

Locally, I'm also happy to run them tomorrow morning

That would be great!

@tustvold
Copy link
Contributor

tustvold commented Dec 9, 2022

$ git checkout master
$ RUSTFLAGS="-C target-cpu=native" cargo bench --features="test_utils,dyn_cmp_dict" --bench comparison_kernels -- --save-baseline master ".*ilike"
...
$ git checkout like-duplication
$ RUSTFLAGS="-C target-cpu=native" cargo bench --features="test_utils,dyn_cmp_dict" --bench comparison_kernels -- --baseline master ".*ilike"

Resulted in

ilike_utf8 scalar equals
                        time:   [2.3886 ms 2.3895 ms 2.3904 ms]
                        change: [+6.2307% +6.2869% +6.3448%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

ilike_utf8 scalar contains
                        time:   [4.3580 ms 4.3595 ms 4.3610 ms]
                        change: [+5.0782% +5.1287% +5.1773%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

ilike_utf8 scalar ends with
                        time:   [2.3695 ms 2.3705 ms 2.3715 ms]
                        change: [+0.9025% +0.9541% +1.0075%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

ilike_utf8 scalar starts with
                        time:   [2.4795 ms 2.4811 ms 2.4827 ms]
                        change: [+5.6493% +5.7240% +5.8024%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking ilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar complex
                        time:   [1.8226 ms 1.8233 ms 1.8240 ms]
                        change: [-4.7478% -4.5718% -4.3951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

nilike_utf8 scalar equals
                        time:   [2.4453 ms 2.4466 ms 2.4479 ms]
                        change: [+1.8160% +1.8795% +1.9439%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

nilike_utf8 scalar contains
                        time:   [4.4117 ms 4.4137 ms 4.4159 ms]
                        change: [+3.6819% +3.7352% +3.7913%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

nilike_utf8 scalar ends with
                        time:   [2.4475 ms 2.4486 ms 2.4498 ms]
                        change: [+2.9289% +2.9881% +3.0444%] (p = 0.00 < 0.05)
                        Performance has regressed.

nilike_utf8 scalar starts with
                        time:   [2.4008 ms 2.4014 ms 2.4021 ms]
                        change: [+0.7831% +0.8124% +0.8463%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking nilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar complex
                        time:   [1.8052 ms 1.8061 ms 1.8071 ms]
                        change: [-6.1191% -5.9827% -5.7985%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

So this does represent a performance regression, I will investigate why

@tustvold
Copy link
Contributor

tustvold commented Dec 9, 2022

Switching to use BooleanArray::from_unary made the regression go away, as well as simplifying the code 🎉

} else if right.starts_with('%')
&& right.ends_with('%')
&& !right.ends_with("\\%")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to have been an omission from #2743

@tustvold tustvold merged commit d11da24 into apache:master Dec 9, 2022
@askoa askoa deleted the like-duplication branch December 9, 2022 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants