Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary like scalar kernels #2591

Merged
merged 11 commits into from
Aug 26, 2022
Merged

Dictionary like scalar kernels #2591

merged 11 commits into from
Aug 26, 2022

Conversation

psvri
Copy link
Contributor

@psvri psvri commented Aug 25, 2022

Which issue does this PR close?

Partially implements #1975.

Rationale for this change

Enhancement to add like kernels for dictionary.

What changes are included in this PR?

Like kernels for dictionary and string

Are there any user-facing changes?

No

@psvri
Copy link
Contributor Author

psvri commented Aug 25, 2022

A side effect of this PR resulted in some nice performance improvements as well. Some othese range form 2-3% to about 30/50% for some use cases.

On my OCI 4 core arm machine these are the improvements I am getting

Click me
like_utf8 scalar equals time:   [332.65 µs 332.75 µs 332.92 µs]                                    
                        change: [-11.762% -11.542% -11.363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking like_utf8 scalar contains: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.0s, enable flat sampling, or reduce sample count to 40.
like_utf8 scalar contains                                                                             
                        time:   [1.9687 ms 1.9700 ms 1.9712 ms]
                        change: [-0.8125% -0.7296% -0.6506%] (p = 0.00 < 0.05)
                        Change within noise threshold.

like_utf8 scalar ends with                                                                            
                        time:   [333.54 µs 333.57 µs 333.60 µs]
                        change: [-6.2180% -6.1563% -6.0802%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  5 (5.00%) high severe

like_utf8 scalar starts with                                                                            
                        time:   [354.51 µs 354.59 µs 354.70 µs]
                        change: [-5.8751% -5.8337% -5.7908%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

like_utf8 scalar complex                                                                            
                        time:   [8.2678 ms 8.2691 ms 8.2704 ms]
                        change: [+0.3933% +0.4141% +0.4367%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

nlike_utf8 scalar equals                                                                            
                        time:   [359.67 µs 359.72 µs 359.76 µs]
                        change: [-33.604% -33.579% -33.553%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

nlike_utf8 scalar contains                                                                             
                        time:   [2.0056 ms 2.0071 ms 2.0086 ms]
                        change: [-9.1066% -9.0256% -8.9406%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

nlike_utf8 scalar ends with                                                                            
                        time:   [357.22 µs 357.28 µs 357.34 µs]
                        change: [-35.930% -35.904% -35.861%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  1 (1.00%) high severe

nlike_utf8 scalar starts with                                                                            
                        time:   [377.53 µs 377.68 µs 377.84 µs]
                        change: [-33.147% -33.106% -33.066%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

nlike_utf8 scalar complex                                                                            
                        time:   [8.3052 ms 8.3066 ms 8.3081 ms]
                        change: [-2.9553% -2.9313% -2.9080%] (p = 0.00 < 0.05)
                        Performance has improved.

ilike_utf8 scalar equals                                                                             
                        time:   [2.8657 ms 2.8660 ms 2.8663 ms]
                        change: [-4.2960% -4.1779% -4.0962%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

ilike_utf8 scalar contains                                                                             
                        time:   [4.4981 ms 4.4989 ms 4.4997 ms]
                        change: [-56.457% -56.444% -56.432%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

ilike_utf8 scalar ends with                                                                             
                        time:   [2.9145 ms 2.9158 ms 2.9172 ms]
                        change: [-3.4589% -3.3895% -3.3206%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

ilike_utf8 scalar starts with                                                                             
                        time:   [2.9147 ms 2.9154 ms 2.9162 ms]
                        change: [-4.4854% -4.4584% -4.4289%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

ilike_utf8 scalar complex                                                                            
                        time:   [10.155 ms 10.157 ms 10.158 ms]
                        change: [-1.9673% -1.9384% -1.9101%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

nilike_utf8 scalar equals                                                                             
                        time:   [2.9256 ms 2.9261 ms 2.9267 ms]
                        change: [-2.7385% -2.7064% -2.6726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

nilike_utf8 scalar contains                                                                             
                        time:   [4.5042 ms 4.5065 ms 4.5089 ms]
                        change: [-56.448% -56.424% -56.402%] (p = 0.00 < 0.05)
                        Performance has improved.

nilike_utf8 scalar ends with                                                                             
                        time:   [2.9386 ms 2.9390 ms 2.9394 ms]
                        change: [-1.3554% -1.3115% -1.2672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

nilike_utf8 scalar starts with                                                                             
                        time:   [2.8966 ms 2.8975 ms 2.8983 ms]
                        change: [-3.2629% -3.2297% -3.1958%] (p = 0.00 < 0.05)
                        Performance has improved.

nilike_utf8 scalar complex                                                                            
                        time:   [10.201 ms 10.202 ms 10.205 ms]
                        change: [-2.0189% -1.9831% -1.9477%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 25, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review properly tomorrow, but I do wonder if we could use ArrayAccessor instead of macros for this?

@@ -233,6 +233,91 @@ pub fn like_utf8<OffsetSize: OffsetSizeTrait>(
})
}

macro_rules! like_scalar {
($LEFT: expr, $RIGHT: expr) => {{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be possibly rewritten as a function using ArrayAccessor.

@psvri
Copy link
Contributor Author

psvri commented Aug 25, 2022

I actually tried that.

There were a lot of if else paths in the kernels. It would mean I would have to pick one of the below options

  • write a function which would create a closure based on the pattern. This lead to some very ugly code with Fn's being wrapped in boxes. So I didn't prefer that.
  • Next option is to wrap the if else options inside the closure , but this would mean a comparison would happen for each element which wasn't optimal.
  • The next option was to use ArrayAccessor and compare_op inside each if else. But that is as good as in lineing the function our self. Hence I went with this approach.

Let me know your thoughts here.

@tustvold
Copy link
Contributor

tustvold commented Aug 25, 2022

Can you not just take the contents of the macro and make it into a free generic function on a concrete ArrayAccessor<Item=&str>?

@psvri
Copy link
Contributor Author

psvri commented Aug 25, 2022

Okay , let me try that.

nilike_scalar(left, right)
}

/// Perform SQL `left ILIKE right` operation on [`DictionaryArray`] with values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Perform SQL `left ILIKE right` operation on [`DictionaryArray`] with values
/// Perform SQL `left NOT ILIKE right` operation on [`DictionaryArray`] with values

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you.

I wonder if we should add dyn versions of these kernels as a follow up? 🤔

bit_util::set_bit(bool_slice, i);
unsafe {
if left.value_unchecked(i).ends_with(ends_with) {
bit_util::set_bit(bool_slice, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something for this PR but using MutableBuffer::from_trusted_len_iter may be significantly faster as it performs byte-size writes instead bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it just now , I didnt find it giving that significant performance gains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I guess the string comparisons are substantially more expensive than for primitives

match left.value_type() {
DataType::Utf8 => {
let left = left.downcast_dict::<GenericStringArray<i32>>().unwrap();
like_scalar(left, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually looking at the implementation of the other dictionary comparison kernels and they opt to instead evaluate the predicate against the dictionary, and then call unpack_dict_comparison to translate this to the values as a whole. Might be something to explore, I could see it being very beneficial for DictionaryArray with lots of repeated values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Shall I make this change in the same PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do it as a follow up

@tustvold tustvold merged commit 9abc5f5 into apache:master Aug 26, 2022
@ursabot
Copy link

ursabot commented Aug 26, 2022

Benchmark runs are scheduled for baseline = 63afe25 and contender = 9abc5f5. 9abc5f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@psvri psvri deleted the dictionary_like branch December 4, 2022 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants