Skip to content

perf: Optimize array_has() for scalar needle#20374

Open
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/optimize-array-has
Open

perf: Optimize array_has() for scalar needle#20374
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/optimize-array-has

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Feb 15, 2026

Which issue does this PR close?

Rationale for this change

compare_with_eq() checks for matching array elements via a single pass across the entire flat values buffer, which is reasonably fast. The previous implementation then determined per-row results by creating a BooleanArray slice for each row and calling true_count() to check for any matches. It turns out that that's quite a lot of per-row work.

Instead, we use BooleanBuffer::set_indices() to iterate over the set bits in the comparison result in a single forward pass. We walk this iterator in lockstep with the row offsets to determine whether each row contains a match, which does much less work per-row.

This can be substantially faster, especially for short arrays. For example, for 10-element arrays of int64, it is 3-5x faster than the previous approach. 10-element string arrays are 1.6-4.8x faster. The improvement is smaller but non-zero for larger arrays (e.g., ~1.2x faster for 500 element arrays).

What changes are included in this PR?

In addition to the optimization, this commit adjusts the array_has benchmark code to actually benchmark array_has evaluation (!). The previous benchmark just constructed an Expr.

Are these changes tested?

Yes. Passes existing tests. Performance validated via several benchmark runs.

Are there any user-facing changes?

No.

The previous implementation tested the cost of building an array_has()
`Expr` (!), not actually evaluating the array_has() operation itself.
Refactor things along the way.
@neilconway
Copy link
Contributor Author

Benchmarks:

  group                                       vanilla                                opt
  -----                                       ----                                   ------
  array_has_all/all_found_small_needle/10     1.00      4.6±0.23ms        ? ?/sec    1.00      4.6±0.04ms        ? ?/sec
  array_has_all/all_found_small_needle/100    1.00     11.2±0.12ms        ? ?/sec    1.01     11.4±0.09ms        ? ?/sec
  array_has_all/all_found_small_needle/500    1.01     46.2±0.58ms        ? ?/sec    1.00     45.8±1.09ms        ? ?/sec
  array_has_all/not_all_found/10              1.00      4.3±0.04ms        ? ?/sec    1.00      4.3±0.05ms        ? ?/sec
  array_has_all/not_all_found/100             1.00     10.3±0.20ms        ? ?/sec    1.02     10.5±0.06ms        ? ?/sec
  array_has_all/not_all_found/500             1.01     41.4±0.49ms        ? ?/sec    1.00     41.0±0.89ms        ? ?/sec
  array_has_all_strings/all_found/10          1.07      4.0±0.07ms        ? ?/sec    1.00      3.8±0.03ms        ? ?/sec
  array_has_all_strings/all_found/100         1.00     11.7±0.21ms        ? ?/sec    1.01     11.8±0.10ms        ? ?/sec
  array_has_all_strings/all_found/500         1.02     48.5±1.75ms        ? ?/sec    1.00     47.7±2.52ms        ? ?/sec
  array_has_all_strings/not_all_found/10      1.00      2.7±0.04ms        ? ?/sec    1.02      2.8±0.04ms        ? ?/sec
  array_has_all_strings/not_all_found/100     1.03     10.5±0.26ms        ? ?/sec    1.00     10.2±0.12ms        ? ?/sec
  array_has_all_strings/not_all_found/500     1.00     57.8±0.96ms        ? ?/sec    1.00     57.6±0.81ms        ? ?/sec
  array_has_any/no_match/10                   1.07      5.4±0.13ms        ? ?/sec    1.00      5.0±0.22ms        ? ?/sec
  array_has_any/no_match/100                  1.00     17.6±0.45ms        ? ?/sec    1.02     18.1±0.21ms        ? ?/sec
  array_has_any/no_match/500                  1.00     78.4±1.43ms        ? ?/sec    1.03     80.7±0.62ms        ? ?/sec
  array_has_any/some_match/10                 1.01      4.6±0.05ms        ? ?/sec    1.00      4.5±0.09ms        ? ?/sec
  array_has_any/some_match/100                1.00     10.9±0.10ms        ? ?/sec    1.03     11.2±0.15ms        ? ?/sec
  array_has_any/some_match/500                1.10     47.9±0.64ms        ? ?/sec    1.00     43.6±0.61ms        ? ?/sec
  array_has_any_strings/no_match/10           1.00      3.6±0.05ms        ? ?/sec    1.02      3.7±0.07ms        ? ?/sec
  array_has_any_strings/no_match/100          1.00     17.5±0.22ms        ? ?/sec    1.00     17.5±0.28ms        ? ?/sec
  array_has_any_strings/no_match/500          1.03    112.5±1.99ms        ? ?/sec    1.00    109.6±1.89ms        ? ?/sec
  array_has_any_strings/some_match/10         1.00      3.3±0.04ms        ? ?/sec    1.13      3.7±0.08ms        ? ?/sec
  array_has_any_strings/some_match/100        1.00     10.4±0.16ms        ? ?/sec    1.04     10.9±0.13ms        ? ?/sec
  array_has_any_strings/some_match/500        1.00     42.6±1.31ms        ? ?/sec    1.00     42.5±1.06ms        ? ?/sec
  array_has_i64/found/10                      3.14    516.1±8.76µs        ? ?/sec    1.00    164.1±4.76µs        ? ?/sec
  array_has_i64/found/100                     1.57  1043.2±25.75µs        ? ?/sec    1.00   666.3±15.72µs        ? ?/sec
  array_has_i64/found/500                     1.19      3.7±0.05ms        ? ?/sec    1.00      3.1±0.18ms        ? ?/sec
  array_has_i64/not_found/10                  5.27    514.7±4.70µs        ? ?/sec    1.00     97.7±3.40µs        ? ?/sec
  array_has_i64/not_found/100                 1.85  1035.2±11.34µs        ? ?/sec    1.00   559.5±17.33µs        ? ?/sec
  array_has_i64/not_found/500                 1.22      3.7±0.10ms        ? ?/sec    1.00      3.0±0.09ms        ? ?/sec
  array_has_strings/found/10                  1.61   996.1±13.42µs        ? ?/sec    1.00    618.1±6.67µs        ? ?/sec
  array_has_strings/found/100                 1.18      2.5±0.03ms        ? ?/sec    1.00      2.1±0.10ms        ? ?/sec
  array_has_strings/found/500                 1.13     10.3±0.82ms        ? ?/sec    1.00      9.1±0.80ms        ? ?/sec
  array_has_strings/not_found/10              4.82   550.1±33.51µs        ? ?/sec    1.00    114.2±3.77µs        ? ?/sec
  array_has_strings/not_found/100             1.15      5.3±0.06ms        ? ?/sec    1.00      4.6±0.13ms        ? ?/sec
  array_has_strings/not_found/500             1.05     14.1±0.22ms        ? ?/sec    1.00     13.4±0.43ms        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize array_has() for scalar needle

2 participants

Comments