18729: Add benchmark for array_has/array_has_all/array_has_any#31
18729: Add benchmark for array_has/array_has_all/array_has_any#31martin-augment wants to merge 9 commits intomainfrom
Conversation
Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: Yongting You <2010youy01@gmail.com>
WalkthroughA new Criterion-based benchmarking suite was added for nested array operations in the DataFusion project. The changes include a new benchmark entry in the Cargo.toml file and a comprehensive benchmarking file that tests array_has, array_has_all, and array_has_any functions. The benchmarks cover multiple scenarios with varying array sizes (1, 10, 100, 1000, 10000), special cases like empty and single-element arrays, string array operations, and edge cases. Helper functions organize the benchmarks into logical groups, and test data is constructed using make_array and lit utilities with black_box to prevent compiler optimizations from skewing results. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| let needle = lit("TICKER0050"); | ||
|
|
||
| b.iter(|| black_box(array_has(list_array.clone(), needle.clone()))) | ||
| }); |
There was a problem hiding this comment.
Bug: Benchmark Misrepresents Search Performance
The benchmark searches for "TICKER0050" but when size=10, the array only contains TICKER0000 through TICKER0009, so the needle won't be found. This causes the "found" benchmark to actually test a "not found" case for the smallest array size, producing misleading performance measurements.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct that the needed won't be found when the size of the generated array is 10. The ticker value should be changed to 0005 to fix this issue.
|
|
||
| b.iter(|| black_box(array_has_all(list_array.clone(), needle_array.clone()))) | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Bug: Benchmark Misrepresents "All Found" Case
The benchmark labeled "all_found_small_needle" searches for elements [0, 1, 2] in the haystack. When array_size=1, the haystack only contains [0], so elements 1 and 2 won't be found. This causes the benchmark to test a partial match case instead of the intended "all elements found" case for the smallest array size.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct that the test is invalid for the case when the array has just one element. In this case it won't be able to find a sub-array with three elements inside it.
| .map(|i| lit(format!("TICKER{i:04}"))) | ||
| .collect::<Vec<_>>(); | ||
| let list_array = make_array(array); | ||
| let needle = lit("TICKER0050"); |
There was a problem hiding this comment.
In the "found" case for string arrays, using "TICKER0050" means size=10 does not actually contain the needle (only 0000..0009), which makes this benchmark mislabeled/misleading for that size (consider adjusting the needle or sizes).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct that the needed won't be found when the size of the generated array is 10. The ticker value should be changed to 0005 to fix this issue.
PR Review: Add benchmark for array_has/array_has_all/array_has_anyOverviewThis PR adds comprehensive benchmarks for the Code Quality ✅Strengths:
Potential Issues
|
| Aspect | This PR | array_reverse.rs | map.rs |
|---|---|---|---|
| License header | ✅ | ✅ | ✅ |
| extern crate | ✅ | ✅ | ✅ |
| Criterion setup | ✅ | ✅ | ✅ |
| black_box usage | ✅ | ✅ | ✅ |
| Actual execution | ❌ | ✅ | ✅ |
Recommendations
Critical (must fix):
- Fix the benchmark to measure actual execution instead of expression building. Reference
array_reverse.rs:32-34ormap.rs:114for patterns.
Important:
- Consider building arrays using Arrow's array builders instead of the expression API for more realistic benchmarks
- Add a comment explaining what aspect of performance each benchmark group is measuring
Nice to have:
- Add randomized data scenarios
- Test with different array types (List, LargeList, FixedSizeList)
- Document expected performance characteristics in comments
Example Fix
Here's how you might fix the core issue:
use datafusion_functions_nested::array_has::{array_has_inner_for_scalar, array_has_all_inner};
use arrow::array::{Int64Array, ListArray};
fn bench_array_has(c: &mut Criterion, array_size: usize) {
let mut group = c.benchmark_group("array_has_i64");
group.bench_with_input(
BenchmarkId::new("found_at_start", array_size),
&array_size,
|b, &size| {
// Build actual Arrow arrays
let values = Int64Array::from_iter_values(0..size as i64);
let list_array = /* build ListArray from values */;
let needle = /* create scalar */;
b.iter(|| {
black_box(array_has_inner_for_scalar(&list_array, &needle).unwrap())
})
},
);
}Conclusion
This is a solid benchmark suite with excellent scenario coverage. The main issue is that it's currently benchmarking expression construction rather than execution. Once that's fixed, this will be a valuable addition to the project!
Overall Assessment: Needs revision before merge due to the execution vs construction issue, but the structure and coverage are excellent.
value:useful; category:bug; feedback:The Claude AI reviewer is correct that the benchmarks are not realistic because they do not test the execution of the built expressions. The other benchmarks (array_reverse.rs and map.rs) do test the execution part too. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct that the generated data is not very common in real life. The tests could be made more realistic by using other stable/reproducible distributions. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct that the benchmarks could be improved by passing references to the array and the needle because the cloning could take more time than the tested operation when the array size is not big enough. |
18729: To review by AI