Skip to content

Conversation

@logan-keede
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Logical Plan for datatype Int64 and UInt64 differs, UInt64 Logical Plan's Union are wrapped up in Projection, and EliminateNestedUnion OptimezerRule is not applied leading to significantly longer execution time.

What changes are included in this PR?

Separating Benchmarks based on datatype, converting a datatype specific function to a generic one.

Are these changes tested?

Yes.

Are there any user-facing changes?

No, benchmarks only.

@github-actions github-actions bot added the core Core DataFusion crate label Nov 10, 2025
@logan-keede
Copy link
Contributor Author

cc @alamb @Omega359

let array_data: Vec<T::Native> = (0..num_rows)
.map(|j| {
let value = (j as u64) * 100 + (i as u64);
<T::Native as NumCast>::from(value).unwrap_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, first I've seen NumCast before. I'll have to remember this!

@Omega359
Copy link
Contributor

LGTM, thanks!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @logan-keede and @Omega359 -- this looks great to me too

I also tried running it via

cargo bench --bench sql_planner -- physical_sorted_union_order_by

And as you pointed out, int64 takes more than 10x the time of uint64. Very nice find

Gnuplot not found, using plotters backend
physical_sorted_union_order_by_10_int64
                        time:   [2.6344 ms 2.6446 ms 2.6559 ms]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

Benchmarking physical_sorted_union_order_by_50_int64: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, or reduce sample count to 60.
physical_sorted_union_order_by_50_int64
                        time:   [79.580 ms 80.020 ms 80.510 ms]
Found 11 outliers among 100 measurements (11.00%)
  10 (10.00%) high mild
  1 (1.00%) high severe

physical_sorted_union_order_by_10_uint64
                        time:   [5.8978 ms 5.9296 ms 5.9655 ms]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

Benchmarking physical_sorted_union_order_by_50_uint64: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.0s, or reduce sample count to 20.
physical_sorted_union_order_by_50_uint64
                        time:   [168.75 ms 169.33 ms 169.99 ms]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

@alamb
Copy link
Contributor

alamb commented Nov 11, 2025

I merged up from main and pushed a commit to resolve the CI fmt failure

@alamb alamb added this pull request to the merge queue Nov 11, 2025
@alamb
Copy link
Contributor

alamb commented Nov 11, 2025

Thanks again @logan-keede and @Omega359

Merged via the queue into apache:main with commit 03d1974 Nov 11, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants