Skip to content

18729: Add benchmark for array_has/array_has_all/array_has_any#31

Open
martin-augment wants to merge 9 commits intomainfrom
pr-18729-2025-11-16-13-10-59
Open

18729: Add benchmark for array_has/array_has_all/array_has_any#31
martin-augment wants to merge 9 commits intomainfrom
pr-18729-2025-11-16-13-10-59

Conversation

@martin-augment
Copy link
Owner

18729: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Nov 16, 2025

Walkthrough

A new Criterion-based benchmarking suite was added for nested array operations in the DataFusion project. The changes include a new benchmark entry in the Cargo.toml file and a comprehensive benchmarking file that tests array_has, array_has_all, and array_has_any functions. The benchmarks cover multiple scenarios with varying array sizes (1, 10, 100, 1000, 10000), special cases like empty and single-element arrays, string array operations, and edge cases. Helper functions organize the benchmarks into logical groups, and test data is constructed using make_array and lit utilities with black_box to prevent compiler optimizations from skewing results.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18729-2025-11-16-13-10-59

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86ebaab and e7a723b.

📒 Files selected for processing (2)
  • datafusion/functions-nested/Cargo.toml (1 hunks)
  • datafusion/functions-nested/benches/array_has.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
  • GitHub Check: Analyze (actions)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
datafusion/functions-nested/Cargo.toml (1)

72-74: LGTM! Benchmark entry properly configured.

The new benchmark entry follows the same pattern as existing benchmarks with harness = false, which is correct for Criterion-based benchmarks.

datafusion/functions-nested/benches/array_has.rs (4)

18-26: Well-structured imports for benchmarking.

The imports are appropriate for a Criterion-based benchmark suite. Good use of black_box to prevent compiler optimizations from affecting benchmark results.


28-46: Excellent benchmark coverage and organization.

The benchmark suite comprehensively covers various scenarios: multiple array sizes, integer and string types, and edge cases. The organization into separate helper functions makes the code maintainable.


376-377: Correct Criterion setup.

The criterion_group! and criterion_main! macros are properly configured to execute the benchmark suite.


48-374: ****

The benchmark code is correctly designed. The cloning of list_array and needle/needle_array before each iteration is necessary and intentional—the generated array_has, array_has_all, and array_has_any functions consume their Expr parameters (they take owned values, not references). Cloning is the proper pattern to reuse test expressions across multiple benchmark iterations. The benchmarks are not measuring cloning overhead; they are correctly measuring function performance with properly managed test data.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

let needle = lit("TICKER0050");

b.iter(|| black_box(array_has(list_array.clone(), needle.clone())))
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Benchmark Misrepresents Search Performance

The benchmark searches for "TICKER0050" but when size=10, the array only contains TICKER0000 through TICKER0009, so the needle won't be found. This causes the "found" benchmark to actually test a "not found" case for the smallest array size, producing misleading performance measurements.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct that the needed won't be found when the size of the generated array is 10. The ticker value should be changed to 0005 to fix this issue.


b.iter(|| black_box(array_has_all(list_array.clone(), needle_array.clone())))
},
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Benchmark Misrepresents "All Found" Case

The benchmark labeled "all_found_small_needle" searches for elements [0, 1, 2] in the haystack. When array_size=1, the haystack only contains [0], so elements 1 and 2 won't be found. This causes the benchmark to test a partial match case instead of the intended "all elements found" case for the smallest array size.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct that the test is invalid for the case when the array has just one element. In this case it won't be able to find a sub-array with three elements inside it.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

.map(|i| lit(format!("TICKER{i:04}")))
.collect::<Vec<_>>();
let list_array = make_array(array);
let needle = lit("TICKER0050");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "found" case for string arrays, using "TICKER0050" means size=10 does not actually contain the needle (only 0000..0009), which makes this benchmark mislabeled/misleading for that size (consider adjusting the needle or sizes).

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct that the needed won't be found when the size of the generated array is 10. The ticker value should be changed to 0005 to fix this issue.

@claude
Copy link

claude bot commented Nov 16, 2025

PR Review: Add benchmark for array_has/array_has_all/array_has_any

Overview

This PR adds comprehensive benchmarks for the array_has, array_has_all, and array_has_any functions. The benchmark suite is well-structured and covers a good range of scenarios.

Code Quality ✅

Strengths:

  • ✅ Proper Apache 2.0 license header
  • ✅ Good use of criterion benchmark groups for organization
  • ✅ Comprehensive coverage of different scenarios (found at start, found at end, not found, edge cases)
  • ✅ Proper use of black_box to prevent compiler optimizations
  • ✅ Well-commented sections explaining what each benchmark tests
  • ✅ Appropriate Cargo.toml configuration matching existing benchmarks

Potential Issues ⚠️

1. Benchmark Design - Expression Building vs Runtime Performance

The benchmarks are currently measuring expression construction rather than actual function execution. This is a critical issue:

Current approach:

b.iter(|| black_box(array_has(list_array.clone(), needle.clone())))

This measures the time to build the expression tree (array_has() returns an Expr), not the time to execute the array search operation.

Comparison with other benchmarks:

  • array_reverse.rs (line 32-34) calls array_reverse_inner() which actually executes the operation
  • map.rs (line 114) uses invoke_with_args() to execute the UDF

Recommendation: You should either:

  1. Call the actual implementation functions (array_has_inner_for_scalar, array_has_all_inner, etc.) from datafusion/functions-nested/src/array_has.rs
  2. Use the invoke_with_args pattern like in map.rs to execute the UDF

2. Data Realism

The benchmarks use sequential integer arrays (0..size), which may not represent realistic data distributions:

let array = (0..size).map(|i| lit(i as i64)).collect::<Vec<_>>();

Suggestions:

  • Add benchmarks with randomized data
  • Consider duplicates (you have one test for this, which is good)
  • Test with different data distributions (skewed, uniform, etc.)

3. Memory and Cloning Overhead

Each iteration clones the arrays:

b.iter(|| black_box(array_has(list_array.clone(), needle.clone())))

If the benchmark is actually executing (after fixing issue #1), the cloning overhead might dominate small array operations. Consider using references where possible or mention this in documentation.

Performance Considerations 🚀

Good aspects:

  • ✅ Tests multiple array sizes (1, 10, 100, 1000, 10000)
  • ✅ Tests both i64 and string types
  • ✅ Covers early exit scenarios
  • ✅ Tests edge cases (empty arrays, single elements, duplicates)

Suggestions:

  1. Consider adding benchmarks for:

    • Different element types (floats, nested arrays)
    • Very large arrays (100k+ elements) to test scalability
    • Arrays with high null density
  2. String benchmarks could be more realistic:

    • Current: TICKER0000 format is good for uniform length
    • Consider: varying string lengths, unicode strings

Security Concerns 🔒

No security issues identified. The benchmarks don't execute arbitrary code or handle untrusted input.

Test Coverage 📊

Excellent coverage of scenarios:

  • ✅ Best/worst/average cases
  • ✅ Empty arrays and null handling
  • ✅ Different needle sizes (small, medium, ~10% of haystack)
  • ✅ Early exit optimization testing
  • ✅ String-specific benchmarks

Missing (nice-to-have):

  • Integration with actual Arrow arrays (not just expression building)
  • Benchmarks comparing different array types (List vs FixedSizeList) like array_reverse.rs does

Best Practices Alignment 📋

Comparing with existing benchmarks:

Aspect This PR array_reverse.rs map.rs
License header
extern crate
Criterion setup
black_box usage
Actual execution

Recommendations

Critical (must fix):

  1. Fix the benchmark to measure actual execution instead of expression building. Reference array_reverse.rs:32-34 or map.rs:114 for patterns.

Important:

  1. Consider building arrays using Arrow's array builders instead of the expression API for more realistic benchmarks
  2. Add a comment explaining what aspect of performance each benchmark group is measuring

Nice to have:

  1. Add randomized data scenarios
  2. Test with different array types (List, LargeList, FixedSizeList)
  3. Document expected performance characteristics in comments

Example Fix

Here's how you might fix the core issue:

use datafusion_functions_nested::array_has::{array_has_inner_for_scalar, array_has_all_inner};
use arrow::array::{Int64Array, ListArray};

fn bench_array_has(c: &mut Criterion, array_size: usize) {
    let mut group = c.benchmark_group("array_has_i64");
    
    group.bench_with_input(
        BenchmarkId::new("found_at_start", array_size),
        &array_size,
        |b, &size| {
            // Build actual Arrow arrays
            let values = Int64Array::from_iter_values(0..size as i64);
            let list_array = /* build ListArray from values */;
            let needle = /* create scalar */;

            b.iter(|| {
                black_box(array_has_inner_for_scalar(&list_array, &needle).unwrap())
            })
        },
    );
}

Conclusion

This is a solid benchmark suite with excellent scenario coverage. The main issue is that it's currently benchmarking expression construction rather than execution. Once that's fixed, this will be a valuable addition to the project!

Overall Assessment: Needs revision before merge due to the execution vs construction issue, but the structure and coverage are excellent.

@martin-augment
Copy link
Owner Author

1. Benchmark Design - Expression Building vs Runtime Performance

The benchmarks are currently measuring expression construction rather than actual function execution. This is a critical issue:

Current approach:

b.iter(|| black_box(array_has(list_array.clone(), needle.clone())))

This measures the time to build the expression tree (array_has() returns an Expr), not the time to execute the array search operation.

value:useful; category:bug; feedback:The Claude AI reviewer is correct that the benchmarks are not realistic because they do not test the execution of the built expressions. The other benchmarks (array_reverse.rs and map.rs) do test the execution part too.

@martin-augment
Copy link
Owner Author

Data Realism

The benchmarks use sequential integer arrays (0..size), which may not represent realistic data distributions:

let array = (0..size).map(|i| lit(i as i64)).collect::<Vec<_>>();

Suggestions:

  • Add benchmarks with randomized data
  • Consider duplicates (you have one test for this, which is good)
  • Test with different data distributions (skewed, uniform, etc.)

value:useful; category:bug; feedback:The Claude AI reviewer is correct that the generated data is not very common in real life. The tests could be made more realistic by using other stable/reproducible distributions.

@martin-augment
Copy link
Owner Author

. Memory and Cloning Overhead

Each iteration clones the arrays:

b.iter(|| black_box(array_has(list_array.clone(), needle.clone())))

If the benchmark is actually executing (after fixing issue #1), the cloning overhead might dominate small array operations. Consider using references where possible or mention this in documentation.

value:useful; category:bug; feedback:The Claude AI reviewer is correct that the benchmarks could be improved by passing references to the array and the needle because the cloning could take more time than the tested operation when the array size is not big enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments