Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Oct 8, 2025

Which issue does this PR close?

N/A

Rationale for this change

Making multi column aggregation even faster

What changes are included in this PR?

In PrimitiveGroupValueBuilder.vectorized_equal_to always evaluate and use unchecked as both of these changes are what making the code compile to SIMD.

Are these changes tested?

Existing tests

Are there any user-facing changes?

Nope


I tried a LOT of variations GodBolt
from splitting to fixed size chunks and trying to get auto-vectorization to use gather and creating bitmask to even testing portable SIMD (just to see what it will generate).

this version only optimize the non null path for the moment as it is the easiest.

once and if we change from &mut [bool] to mutable packed bits we could:

  1. evaluate in chunks of 64 items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with the equal_to_results boolean buffer
  2. add optimization for nullable as well by just doing bitwise operation at 64 items at a time and avoid the cost of getting each bit manually
  3. skip 64 items right away if the the equal_to_results equal to 0x00 (i.e. all false)

@rluvaton rluvaton added the performance Make DataFusion faster label Oct 8, 2025
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Oct 8, 2025
Comment on lines +112 to +135
let iter = izip!(
lhs_rows.iter(),
rhs_rows.iter(),
equal_to_results.iter_mut(),
);

for (&lhs_row, &rhs_row, equal_to_result) in iter {
// Has found not equal to in previous column, don't need to check
if !*equal_to_result {
continue;
}

// Perf: skip null check (by short circuit) if input is not nullable
let exist_null = self.nulls.is_null(lhs_row);
let input_null = array.is_null(rhs_row);
if let Some(result) = nulls_equal_to(exist_null, input_null) {
*equal_to_result = result;
continue;
}

// Otherwise, we need to check their values
*equal_to_result = self.group_values[lhs_row].is_eq(array.value(rhs_row));
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the code from vectorized_equal_to and removed the if NULLABLE as we will always get here if nullable

@rluvaton
Copy link
Member Author

rluvaton commented Oct 8, 2025

@alamb can you please run aggregate_vectorized benchmark with these changes?

fn bench_vectorized_append(c: &mut Criterion) {

self.group_values[lhs_row]
} else {
// SAFETY: indices are guaranteed to be in bounds
unsafe { *self.group_values.get_unchecked(lhs_row) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As lhs_row is not checked here te be in bounds, this method would need to be marked unsafe as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

@ctsk
Copy link
Contributor

ctsk commented Oct 14, 2025

I could run the benchmarks locally so that we can see the performance gains. Can you provide two git hashes at which to run the criterion benchmark and compare?

@alamb
Copy link
Contributor

alamb commented Nov 7, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-primitive-multi-group-by-to-use-simd (0a6f6d3) to 969fc13 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Nov 7, 2025

🤖: Benchmark completed

Details

Comparing HEAD and optimize-primitive-multi-group-by-to-use-simd
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2846.59 ms │                                    2788.14 ms │     no change │
│ QQuery 1     │  1279.77 ms │                                    1307.16 ms │     no change │
│ QQuery 2     │  2564.09 ms │                                    2565.42 ms │     no change │
│ QQuery 3     │  1189.15 ms │                                    1102.74 ms │ +1.08x faster │
│ QQuery 4     │  2357.47 ms │                                    2331.30 ms │     no change │
│ QQuery 5     │ 28146.52 ms │                                   27825.38 ms │     no change │
│ QQuery 6     │  4194.17 ms │                                    4225.22 ms │     no change │
│ QQuery 7     │  3892.58 ms │                                    3763.97 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 46470.35ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │ 45909.32ms │
│ Average Time (HEAD)                                          │  5808.79ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │  5738.67ms │
│ Queries Faster                                               │          1 │
│ Queries Slower                                               │          0 │
│ Queries with No Change                                       │          7 │
│ Queries with Failure                                         │          0 │
└──────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.23 ms │                                       2.69 ms │  1.20x slower │
│ QQuery 1     │    49.36 ms │                                      49.67 ms │     no change │
│ QQuery 2     │   135.52 ms │                                     138.02 ms │     no change │
│ QQuery 3     │   165.74 ms │                                     163.69 ms │     no change │
│ QQuery 4     │  1156.77 ms │                                    1119.71 ms │     no change │
│ QQuery 5     │  1569.66 ms │                                    1545.00 ms │     no change │
│ QQuery 6     │     2.21 ms │                                       2.24 ms │     no change │
│ QQuery 7     │    56.97 ms │                                      53.62 ms │ +1.06x faster │
│ QQuery 8     │  1535.53 ms │                                    1443.48 ms │ +1.06x faster │
│ QQuery 9     │  1913.64 ms │                                    1820.77 ms │     no change │
│ QQuery 10    │   382.50 ms │                                     381.31 ms │     no change │
│ QQuery 11    │   432.43 ms │                                     423.29 ms │     no change │
│ QQuery 12    │  1459.06 ms │                                    1369.77 ms │ +1.07x faster │
│ QQuery 13    │  2198.64 ms │                                    2179.86 ms │     no change │
│ QQuery 14    │  1330.94 ms │                                    1278.89 ms │     no change │
│ QQuery 15    │  1300.73 ms │                                    1248.36 ms │     no change │
│ QQuery 16    │  2772.70 ms │                                    2743.92 ms │     no change │
│ QQuery 17    │  2753.94 ms │                                    2714.33 ms │     no change │
│ QQuery 18    │  5370.36 ms │                                    5007.91 ms │ +1.07x faster │
│ QQuery 19    │   126.02 ms │                                     127.77 ms │     no change │
│ QQuery 20    │  2055.82 ms │                                    2029.83 ms │     no change │
│ QQuery 21    │  2358.66 ms │                                    2332.38 ms │     no change │
│ QQuery 22    │  4013.44 ms │                                    3986.75 ms │     no change │
│ QQuery 23    │ 16492.87 ms │                                   12877.44 ms │ +1.28x faster │
│ QQuery 24    │   225.36 ms │                                     206.53 ms │ +1.09x faster │
│ QQuery 25    │   491.65 ms │                                     475.23 ms │     no change │
│ QQuery 26    │   221.32 ms │                                     217.25 ms │     no change │
│ QQuery 27    │  2934.68 ms │                                    2845.20 ms │     no change │
│ QQuery 28    │ 23576.15 ms │                                   23468.83 ms │     no change │
│ QQuery 29    │  1010.95 ms │                                    1002.77 ms │     no change │
│ QQuery 30    │  1402.75 ms │                                    1328.95 ms │ +1.06x faster │
│ QQuery 31    │  1421.24 ms │                                    1368.73 ms │     no change │
│ QQuery 32    │  5254.62 ms │                                    4971.25 ms │ +1.06x faster │
│ QQuery 33    │  6029.16 ms │                                    5822.23 ms │     no change │
│ QQuery 34    │  6185.48 ms │                                    6002.99 ms │     no change │
│ QQuery 35    │  2160.68 ms │                                    1880.87 ms │ +1.15x faster │
│ QQuery 36    │   121.91 ms │                                     122.82 ms │     no change │
│ QQuery 37    │    51.34 ms │                                      51.40 ms │     no change │
│ QQuery 38    │   122.22 ms │                                     121.74 ms │     no change │
│ QQuery 39    │   197.67 ms │                                     201.67 ms │     no change │
│ QQuery 40    │    44.98 ms │                                      42.53 ms │ +1.06x faster │
│ QQuery 41    │    41.26 ms │                                      39.36 ms │     no change │
│ QQuery 42    │    32.20 ms │                                      32.82 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 101161.37ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │  95243.85ms │
│ Average Time (HEAD)                                          │   2352.59ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │   2214.97ms │
│ Queries Faster                                               │          10 │
│ Queries Slower                                               │           1 │
│ Queries with No Change                                       │          32 │
│ Queries with Failure                                         │           0 │
└──────────────────────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 169.89 ms │                                     132.86 ms │ +1.28x faster │
│ QQuery 2     │  32.08 ms │                                      28.56 ms │ +1.12x faster │
│ QQuery 3     │  42.45 ms │                                      38.83 ms │ +1.09x faster │
│ QQuery 4     │  29.70 ms │                                      28.38 ms │     no change │
│ QQuery 5     │  88.06 ms │                                      87.88 ms │     no change │
│ QQuery 6     │  19.61 ms │                                      19.69 ms │     no change │
│ QQuery 7     │ 238.46 ms │                                     233.19 ms │     no change │
│ QQuery 8     │  33.11 ms │                                      34.86 ms │  1.05x slower │
│ QQuery 9     │ 105.27 ms │                                     104.52 ms │     no change │
│ QQuery 10    │  63.72 ms │                                      62.52 ms │     no change │
│ QQuery 11    │  17.69 ms │                                      18.82 ms │  1.06x slower │
│ QQuery 12    │  53.57 ms │                                      51.21 ms │     no change │
│ QQuery 13    │  46.92 ms │                                      47.43 ms │     no change │
│ QQuery 14    │  14.24 ms │                                      14.07 ms │     no change │
│ QQuery 15    │  25.61 ms │                                      24.89 ms │     no change │
│ QQuery 16    │  25.51 ms │                                      25.09 ms │     no change │
│ QQuery 17    │ 153.98 ms │                                     152.68 ms │     no change │
│ QQuery 18    │ 277.15 ms │                                     274.47 ms │     no change │
│ QQuery 19    │  37.60 ms │                                      36.98 ms │     no change │
│ QQuery 20    │  49.29 ms │                                      50.02 ms │     no change │
│ QQuery 21    │ 329.70 ms │                                     324.28 ms │     no change │
│ QQuery 22    │  21.06 ms │                                      21.81 ms │     no change │
└──────────────┴───────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 1874.67ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │ 1813.04ms │
│ Average Time (HEAD)                                          │   85.21ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │   82.41ms │
│ Queries Faster                                               │         3 │
│ Queries Slower                                               │         2 │
│ Queries with No Change                                       │        17 │
│ Queries with Failure                                         │         0 │
└──────────────────────────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Nov 11, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-primitive-multi-group-by-to-use-simd (0a6f6d3) to 969fc13 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Nov 11, 2025

🤖: Benchmark completed

Details

Comparing HEAD and optimize-primitive-multi-group-by-to-use-simd
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2816.14 ms │                                    2710.67 ms │     no change │
│ QQuery 1     │  1285.37 ms │                                    1316.13 ms │     no change │
│ QQuery 2     │  2513.05 ms │                                    2517.89 ms │     no change │
│ QQuery 3     │  1170.25 ms │                                    1096.87 ms │ +1.07x faster │
│ QQuery 4     │  2324.76 ms │                                    2318.39 ms │     no change │
│ QQuery 5     │ 28876.14 ms │                                   28063.05 ms │     no change │
│ QQuery 6     │  4228.86 ms │                                    4202.73 ms │     no change │
│ QQuery 7     │  3703.70 ms │                                    3717.83 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 46918.28ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │ 45943.57ms │
│ Average Time (HEAD)                                          │  5864.79ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │  5742.95ms │
│ Queries Faster                                               │          1 │
│ Queries Slower                                               │          0 │
│ Queries with No Change                                       │          7 │
│ Queries with Failure                                         │          0 │
└──────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.33 ms │                                       2.52 ms │  1.08x slower │
│ QQuery 1     │    49.63 ms │                                      48.39 ms │     no change │
│ QQuery 2     │   138.52 ms │                                     135.79 ms │     no change │
│ QQuery 3     │   168.63 ms │                                     164.06 ms │     no change │
│ QQuery 4     │  1108.77 ms │                                    1123.36 ms │     no change │
│ QQuery 5     │  1548.18 ms │                                    1520.34 ms │     no change │
│ QQuery 6     │     2.26 ms │                                       2.25 ms │     no change │
│ QQuery 7     │    54.48 ms │                                      55.77 ms │     no change │
│ QQuery 8     │  1508.88 ms │                                    1453.15 ms │     no change │
│ QQuery 9     │  1902.02 ms │                                    1835.81 ms │     no change │
│ QQuery 10    │   377.77 ms │                                     375.84 ms │     no change │
│ QQuery 11    │   429.78 ms │                                     422.50 ms │     no change │
│ QQuery 12    │  1393.06 ms │                                    1372.61 ms │     no change │
│ QQuery 13    │  2131.18 ms │                                    2133.79 ms │     no change │
│ QQuery 14    │  1273.84 ms │                                    1282.05 ms │     no change │
│ QQuery 15    │  1269.81 ms │                                    1279.07 ms │     no change │
│ QQuery 16    │  2721.52 ms │                                    2709.96 ms │     no change │
│ QQuery 17    │  2711.19 ms │                                    2713.23 ms │     no change │
│ QQuery 18    │  5212.97 ms │                                    4981.45 ms │     no change │
│ QQuery 19    │   128.99 ms │                                     125.04 ms │     no change │
│ QQuery 20    │  2007.47 ms │                                    2010.37 ms │     no change │
│ QQuery 21    │  2338.01 ms │                                    2337.84 ms │     no change │
│ QQuery 22    │  3953.59 ms │                                    4004.56 ms │     no change │
│ QQuery 23    │ 14868.86 ms │                                   12866.83 ms │ +1.16x faster │
│ QQuery 24    │   221.96 ms │                                     210.94 ms │     no change │
│ QQuery 25    │   476.66 ms │                                     481.59 ms │     no change │
│ QQuery 26    │   218.59 ms │                                     213.26 ms │     no change │
│ QQuery 27    │  2887.50 ms │                                    2806.91 ms │     no change │
│ QQuery 28    │ 23494.53 ms │                                   23284.37 ms │     no change │
│ QQuery 29    │  1012.55 ms │                                     978.53 ms │     no change │
│ QQuery 30    │  1359.82 ms │                                    1321.19 ms │     no change │
│ QQuery 31    │  1421.23 ms │                                    1402.98 ms │     no change │
│ QQuery 32    │  4954.29 ms │                                    4925.85 ms │     no change │
│ QQuery 33    │  6076.59 ms │                                    5951.85 ms │     no change │
│ QQuery 34    │  5977.99 ms │                                    5847.68 ms │     no change │
│ QQuery 35    │  2088.70 ms │                                    1866.69 ms │ +1.12x faster │
│ QQuery 36    │   121.32 ms │                                     122.92 ms │     no change │
│ QQuery 37    │    53.50 ms │                                      51.50 ms │     no change │
│ QQuery 38    │   122.51 ms │                                     121.11 ms │     no change │
│ QQuery 39    │   199.60 ms │                                     201.22 ms │     no change │
│ QQuery 40    │    43.88 ms │                                      40.28 ms │ +1.09x faster │
│ QQuery 41    │    40.04 ms │                                      38.48 ms │     no change │
│ QQuery 42    │    33.42 ms │                                      33.44 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 98106.37ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │ 94857.40ms │
│ Average Time (HEAD)                                          │  2281.54ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │  2205.99ms │
│ Queries Faster                                               │          3 │
│ Queries Slower                                               │          1 │
│ Queries with No Change                                       │         39 │
│ Queries with Failure                                         │          0 │
└──────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ optimize-primitive-multi-group-by-to-use-simd ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 133.00 ms │                                     131.61 ms │     no change │
│ QQuery 2     │  29.48 ms │                                      29.36 ms │     no change │
│ QQuery 3     │  40.08 ms │                                      34.39 ms │ +1.17x faster │
│ QQuery 4     │  29.80 ms │                                      29.12 ms │     no change │
│ QQuery 5     │  87.19 ms │                                      87.66 ms │     no change │
│ QQuery 6     │  19.69 ms │                                      19.89 ms │     no change │
│ QQuery 7     │ 236.56 ms │                                     227.85 ms │     no change │
│ QQuery 8     │  34.52 ms │                                      35.40 ms │     no change │
│ QQuery 9     │ 110.61 ms │                                     109.57 ms │     no change │
│ QQuery 10    │  66.17 ms │                                      64.84 ms │     no change │
│ QQuery 11    │  18.37 ms │                                      17.86 ms │     no change │
│ QQuery 12    │  53.11 ms │                                      52.06 ms │     no change │
│ QQuery 13    │  47.37 ms │                                      47.61 ms │     no change │
│ QQuery 14    │  16.25 ms │                                      14.59 ms │ +1.11x faster │
│ QQuery 15    │  25.27 ms │                                      25.47 ms │     no change │
│ QQuery 16    │  25.29 ms │                                      25.89 ms │     no change │
│ QQuery 17    │ 150.98 ms │                                     153.85 ms │     no change │
│ QQuery 18    │ 275.68 ms │                                     284.36 ms │     no change │
│ QQuery 19    │  37.50 ms │                                      38.27 ms │     no change │
│ QQuery 20    │  49.30 ms │                                      50.80 ms │     no change │
│ QQuery 21    │ 348.45 ms │                                     333.59 ms │     no change │
│ QQuery 22    │  21.78 ms │                                      22.01 ms │     no change │
└──────────────┴───────────┴───────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                            │ 1856.45ms │
│ Total Time (optimize-primitive-multi-group-by-to-use-simd)   │ 1836.05ms │
│ Average Time (HEAD)                                          │   84.38ms │
│ Average Time (optimize-primitive-multi-group-by-to-use-simd) │   83.46ms │
│ Queries Faster                                               │         2 │
│ Queries Slower                                               │         0 │
│ Queries with No Change                                       │        20 │
│ Queries with Failure                                         │         0 │
└──────────────────────────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Nov 12, 2025

The results look promising and consistent -- thank you @rluvaton -- I plan to review this PR over the next few days

@rluvaton
Copy link
Member Author

rluvaton commented Nov 12, 2025

@alamb can you please run aggregate_vectorized benchmark with these changes?

fn bench_vectorized_append(c: &mut Criterion) {

@alamb this is more relevant bench

@alamb
Copy link
Contributor

alamb commented Nov 13, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-primitive-multi-group-by-to-use-simd (0a6f6d3) to 969fc13 diff
BENCH_NAME=aggregate_vectorized
BENCH_COMMAND=cargo bench --bench aggregate_vectorized
BENCH_FILTER=
BENCH_BRANCH_NAME=optimize-primitive-multi-group-by-to-use-simd
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- this PR looks good to me

unsafe { *array_values.get_unchecked(rhs_row) }
};

// Always evaluate, to allow for auto-vectorization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense for primitive values -- namely that the cost of checking if we should compare dominated just always comparing

@alamb
Copy link
Contributor

alamb commented Nov 13, 2025

once and if we change from &mut [bool] to mutable packed bits we could:

This is a good idea. I filed a ticket to track it

@alamb
Copy link
Contributor

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                                                                             main                                    optimize-primitive-multi-group-by-to-use-simd
-----                                                                                                             ----                                    ---------------------------------------------
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/append_val                                  1.00      9.1±0.02µs        ? ?/sec     1.08      9.9±0.06µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/vectorized_append                           1.00      5.4±0.04µs        ? ?/sec     1.01      5.5±0.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/vectorized_equal_to_0.25 true               1.03      2.6±0.00µs        ? ?/sec     1.00      2.5±0.00µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/vectorized_equal_to_0.5 true                1.03      4.9±0.00µs        ? ?/sec     1.00      4.8±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/vectorized_equal_to_0.75 true               1.03      7.4±0.01µs        ? ?/sec     1.00      7.2±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_1000/vectorized_equal_to_all_true                1.03      9.6±0.06µs        ? ?/sec     1.00      9.3±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/append_val                                 1.00     89.2±0.30µs        ? ?/sec     1.08     96.1±1.23µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/vectorized_append                          1.00     51.4±0.28µs        ? ?/sec     1.01     51.7±0.27µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/vectorized_equal_to_0.25 true              1.01     33.7±0.09µs        ? ?/sec     1.00     33.4±0.11µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/vectorized_equal_to_0.5 true               1.04     61.1±0.16µs        ? ?/sec     1.00     58.9±0.13µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/vectorized_equal_to_0.75 true              1.02     73.7±0.11µs        ? ?/sec     1.00     71.9±1.30µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_10000/vectorized_equal_to_all_true               1.03     96.2±0.16µs        ? ?/sec     1.00     93.5±0.15µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/append_val                                1.00    965.6±2.59µs        ? ?/sec     1.07   1028.6±3.86µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/vectorized_append                         1.00    593.8±1.97µs        ? ?/sec     1.00    591.9±3.00µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/vectorized_equal_to_0.25 true             1.01    395.6±3.10µs        ? ?/sec     1.00    392.6±2.30µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/vectorized_equal_to_0.5 true              1.03    658.9±4.16µs        ? ?/sec     1.00    641.4±2.66µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/vectorized_equal_to_0.75 true             1.03    752.5±3.72µs        ? ?/sec     1.00    730.1±2.68µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.0_size_100000/vectorized_equal_to_all_true              1.03    971.0±1.66µs        ? ?/sec     1.00    943.3±1.95µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/append_val                                  1.00     12.6±0.04µs        ? ?/sec     1.03     13.0±0.10µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/vectorized_append                           1.02     14.4±0.04µs        ? ?/sec     1.00     14.1±0.07µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/vectorized_equal_to_0.25 true               1.03      2.8±0.00µs        ? ?/sec     1.00      2.7±0.00µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/vectorized_equal_to_0.5 true                1.03      5.4±0.01µs        ? ?/sec     1.00      5.3±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/vectorized_equal_to_0.75 true               1.03      8.2±0.01µs        ? ?/sec     1.00      8.0±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_1000/vectorized_equal_to_all_true                1.02     10.8±0.01µs        ? ?/sec     1.00     10.5±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/append_val                                 1.00    124.1±0.45µs        ? ?/sec     1.01    125.2±0.33µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/vectorized_append                          1.02    139.5±1.29µs        ? ?/sec     1.00    136.9±0.33µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/vectorized_equal_to_0.25 true              1.00     37.4±0.18µs        ? ?/sec     1.01     37.7±0.22µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/vectorized_equal_to_0.5 true               1.00     72.4±0.28µs        ? ?/sec     1.00     72.5±0.31µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/vectorized_equal_to_0.75 true              1.01     85.3±0.35µs        ? ?/sec     1.00     84.0±0.24µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_10000/vectorized_equal_to_all_true               1.02    107.1±0.19µs        ? ?/sec     1.00    104.9±0.12µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/append_val                                1.00   1328.9±3.32µs        ? ?/sec     1.00   1327.1±6.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/vectorized_append                         1.02   1483.3±4.80µs        ? ?/sec     1.00   1452.7±4.44µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/vectorized_equal_to_0.25 true             1.00    445.1±3.29µs        ? ?/sec     1.04    464.6±2.55µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/vectorized_equal_to_0.5 true              1.01    800.7±9.22µs        ? ?/sec     1.00    794.5±3.15µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/vectorized_equal_to_0.75 true             1.02    894.1±3.11µs        ? ?/sec     1.00    873.1±3.41µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.1_size_100000/vectorized_equal_to_all_true              1.03   1085.3±3.56µs        ? ?/sec     1.00   1057.7±2.58µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/append_val                                  1.00     11.2±0.07µs        ? ?/sec     1.05     11.7±0.05µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/vectorized_append                           1.00     13.2±0.06µs        ? ?/sec     1.03     13.5±0.09µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/vectorized_equal_to_0.25 true               1.03   1940.8±8.05ns        ? ?/sec     1.00   1884.7±3.58ns        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/vectorized_equal_to_0.5 true                1.03      3.7±0.01µs        ? ?/sec     1.00      3.6±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/vectorized_equal_to_0.75 true               1.03      5.5±0.02µs        ? ?/sec     1.00      5.4±0.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_1000/vectorized_equal_to_all_true                1.00      7.4±0.03µs        ? ?/sec     1.25      9.3±0.04µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/append_val                                 1.00    134.8±0.95µs        ? ?/sec     1.03    138.4±0.32µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/vectorized_append                          1.00    153.8±0.38µs        ? ?/sec     1.01    155.2±0.92µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/vectorized_equal_to_0.25 true              1.00     41.2±0.29µs        ? ?/sec     1.02     41.9±0.28µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/vectorized_equal_to_0.5 true               1.00     82.6±0.43µs        ? ?/sec     1.00     82.5±0.41µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/vectorized_equal_to_0.75 true              1.00     90.8±1.73µs        ? ?/sec     1.00     90.8±1.91µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_10000/vectorized_equal_to_all_true               1.00     98.3±0.39µs        ? ?/sec     1.02    100.2±0.21µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/append_val                                1.00   1473.3±4.52µs        ? ?/sec     1.01   1481.6±4.90µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/vectorized_append                         1.00   1649.7±4.46µs        ? ?/sec     1.01   1661.8±8.69µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/vectorized_equal_to_0.25 true             1.00    504.4±4.57µs        ? ?/sec     1.03    518.1±2.81µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/vectorized_equal_to_0.5 true              1.01    913.5±3.81µs        ? ?/sec     1.00   906.2±15.61µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/vectorized_equal_to_0.75 true             1.00    984.8±3.55µs        ? ?/sec     1.00    981.3±2.43µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inline_null_0.5_size_100000/vectorized_equal_to_all_true              1.01   1034.7±9.17µs        ? ?/sec     1.00   1025.2±4.76µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/append_val                                  1.00     20.1±0.19µs        ? ?/sec     1.16     23.3±0.38µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_append                           1.00     16.4±0.23µs        ? ?/sec     1.20     19.7±0.60µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_equal_to_0.25 true               1.00      3.7±0.04µs        ? ?/sec     1.00      3.7±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_equal_to_0.5 true                1.00      6.4±0.03µs        ? ?/sec     1.01      6.4±0.06µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_equal_to_0.75 true               1.00      9.3±0.06µs        ? ?/sec     1.02      9.5±0.07µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_equal_to_all_true                1.01     13.1±0.05µs        ? ?/sec     1.00     12.9±0.07µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/append_val                                 1.02    377.0±7.67µs        ? ?/sec     1.00    369.6±4.34µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_append                          1.00    339.3±2.25µs        ? ?/sec     1.03    350.4±6.46µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_equal_to_0.25 true              1.19    87.5±13.48µs        ? ?/sec     1.00     73.7±2.37µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_equal_to_0.5 true               1.13   144.5±11.03µs        ? ?/sec     1.00    127.6±8.54µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_equal_to_0.75 true              1.12    171.8±7.29µs        ? ?/sec     1.00    153.4±9.30µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_equal_to_all_true               1.06    184.5±7.28µs        ? ?/sec     1.00    173.7±6.81µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/append_val                                1.06     14.8±0.30ms        ? ?/sec     1.00     13.9±0.22ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_append                         1.07     14.4±0.24ms        ? ?/sec     1.00     13.4±0.21ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_equal_to_0.25 true             1.08  1710.8±241.37µs        ? ?/sec    1.00  1581.8±105.31µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_equal_to_0.5 true              1.00      2.8±0.07ms        ? ?/sec     1.04      2.9±0.07ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_equal_to_0.75 true             1.01      3.2±0.06ms        ? ?/sec     1.00      3.2±0.06ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_equal_to_all_true              1.00      3.6±0.06ms        ? ?/sec     1.00      3.6±0.07ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/append_val                                  1.00     20.3±0.18µs        ? ?/sec     1.06     21.6±0.20µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_append                           1.00     22.3±0.17µs        ? ?/sec     1.08     24.1±0.27µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_equal_to_0.25 true               1.00      3.7±0.02µs        ? ?/sec     1.00      3.7±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_equal_to_0.5 true                1.00      6.5±0.03µs        ? ?/sec     1.00      6.5±0.05µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_equal_to_0.75 true               1.00      9.5±0.04µs        ? ?/sec     1.02      9.7±0.15µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_equal_to_all_true                1.00     12.4±0.07µs        ? ?/sec     1.04     12.9±0.10µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/append_val                                 1.00    318.8±2.36µs        ? ?/sec     1.23    393.2±7.31µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_append                          1.00    342.6±2.07µs        ? ?/sec     1.16    399.1±6.51µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_equal_to_0.25 true              1.00     76.8±8.97µs        ? ?/sec     1.07    81.8±10.63µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_equal_to_0.5 true               1.04    138.7±9.10µs        ? ?/sec     1.00    133.6±1.78µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_equal_to_0.75 true              1.05    174.3±6.37µs        ? ?/sec     1.00    166.8±7.42µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_equal_to_all_true               1.03    193.3±3.93µs        ? ?/sec     1.00    187.4±3.91µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/append_val                                1.08     13.8±0.25ms        ? ?/sec     1.00     12.8±0.21ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_append                         1.01     13.1±0.23ms        ? ?/sec     1.00     13.1±0.22ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_equal_to_0.25 true             1.00  1443.4±123.40µs        ? ?/sec    1.00  1441.2±121.84µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_equal_to_0.5 true              1.05      2.8±0.06ms        ? ?/sec     1.00      2.7±0.07ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_equal_to_0.75 true             1.01      3.1±0.06ms        ? ?/sec     1.00      3.1±0.07ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_equal_to_all_true              1.01      3.5±0.07ms        ? ?/sec     1.00      3.5±0.06ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/append_val                                  1.08     17.6±0.33µs        ? ?/sec     1.00     16.3±0.21µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_append                           1.05     20.6±0.34µs        ? ?/sec     1.00     19.6±0.24µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_equal_to_0.25 true               1.00      2.7±0.02µs        ? ?/sec     1.01      2.8±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_equal_to_0.5 true                1.00      4.7±0.04µs        ? ?/sec     1.02      4.8±0.05µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_equal_to_0.75 true               1.00      6.7±0.06µs        ? ?/sec     1.03      6.9±0.09µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_equal_to_all_true                1.00      8.5±0.08µs        ? ?/sec     1.04      8.8±0.13µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/append_val                                 1.03    246.6±3.57µs        ? ?/sec     1.00    239.6±1.47µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_append                          1.03    265.1±1.89µs        ? ?/sec     1.00    257.5±1.59µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_equal_to_0.25 true              1.00     59.2±1.98µs        ? ?/sec     1.03     61.0±1.40µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_equal_to_0.5 true               1.00    114.6±3.07µs        ? ?/sec     1.05    120.0±3.09µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_equal_to_0.75 true              1.00    137.0±1.09µs        ? ?/sec     1.06    145.5±1.50µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_equal_to_all_true               1.00    153.1±2.40µs        ? ?/sec     1.05    161.5±1.78µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/append_val                                1.09      8.4±0.15ms        ? ?/sec     1.00      7.7±0.12ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_append                         1.00      8.4±0.19ms        ? ?/sec     1.00      8.4±0.20ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_equal_to_0.25 true             1.00   881.0±55.10µs        ? ?/sec     1.10   965.2±49.58µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_equal_to_0.5 true              1.00  1612.5±55.07µs        ? ?/sec     1.08  1741.2±85.55µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_equal_to_0.75 true             1.00  1876.9±47.41µs        ? ?/sec     1.06  1992.9±71.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_equal_to_all_true              1.00      2.2±0.04ms        ? ?/sec     1.02      2.2±0.05ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/append_val                                1.00     17.4±0.16µs        ? ?/sec     1.06     18.4±0.07µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/vectorized_append                         1.00     13.1±0.02µs        ? ?/sec     1.05     13.7±0.08µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/vectorized_equal_to_0.25 true             1.01      2.4±0.02µs        ? ?/sec     1.00      2.4±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/vectorized_equal_to_0.5 true              1.00      4.3±0.04µs        ? ?/sec     1.00      4.2±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/vectorized_equal_to_0.75 true             1.00      6.2±0.05µs        ? ?/sec     1.01      6.2±0.04µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_1000/vectorized_equal_to_all_true              1.04      8.8±0.14µs        ? ?/sec     1.00      8.5±0.05µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/append_val                               1.00    178.3±0.79µs        ? ?/sec     1.16    207.0±0.68µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/vectorized_append                        1.00    149.3±0.50µs        ? ?/sec     1.02    152.8±0.66µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/vectorized_equal_to_0.25 true            1.00     35.4±0.24µs        ? ?/sec     1.11     39.4±0.22µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/vectorized_equal_to_0.5 true             1.00     69.8±0.40µs        ? ?/sec     1.02     71.3±0.33µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/vectorized_equal_to_0.75 true            1.01     81.2±0.33µs        ? ?/sec     1.00     80.3±0.34µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_10000/vectorized_equal_to_all_true             1.02     95.3±0.25µs        ? ?/sec     1.00     93.0±0.45µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/append_val                              1.01      4.0±0.08ms        ? ?/sec     1.00      3.9±0.09ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/vectorized_append                       1.03      3.7±0.06ms        ? ?/sec     1.00      3.6±0.06ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/vectorized_equal_to_0.25 true           1.02    513.5±4.99µs        ? ?/sec     1.00    505.6±5.00µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/vectorized_equal_to_0.5 true            1.03    836.2±2.25µs        ? ?/sec     1.00    815.0±3.85µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/vectorized_equal_to_0.75 true           1.03    908.8±3.01µs        ? ?/sec     1.00    883.3±8.20µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.0_size_100000/vectorized_equal_to_all_true            1.03   1029.9±4.30µs        ? ?/sec     1.00   1001.8±3.71µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/append_val                                1.00     18.8±0.05µs        ? ?/sec     1.00     18.8±0.05µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/vectorized_append                         1.00     20.5±0.04µs        ? ?/sec     1.02     20.9±0.07µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/vectorized_equal_to_0.25 true             1.01      2.8±0.01µs        ? ?/sec     1.00      2.7±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/vectorized_equal_to_0.5 true              1.01      5.0±0.02µs        ? ?/sec     1.00      5.0±0.02µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/vectorized_equal_to_0.75 true             1.00      7.2±0.02µs        ? ?/sec     1.01      7.3±0.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_1000/vectorized_equal_to_all_true              1.00      9.3±0.03µs        ? ?/sec     1.02      9.5±0.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/append_val                               1.00    211.5±0.81µs        ? ?/sec     1.01    212.7±1.91µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/vectorized_append                        1.00    225.2±0.68µs        ? ?/sec     1.02    228.6±0.74µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/vectorized_equal_to_0.25 true            1.00     39.3±0.44µs        ? ?/sec     1.10     43.3±0.45µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/vectorized_equal_to_0.5 true             1.00     78.2±0.71µs        ? ?/sec     1.07     83.3±0.42µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/vectorized_equal_to_0.75 true            1.00     94.7±0.50µs        ? ?/sec     1.03     97.2±1.14µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_10000/vectorized_equal_to_all_true             1.00    107.5±0.56µs        ? ?/sec     1.02    109.8±0.23µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/append_val                              1.02      2.3±0.01ms        ? ?/sec     1.00      2.2±0.01ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/vectorized_append                       1.01      2.4±0.02ms        ? ?/sec     1.00      2.4±0.01ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/vectorized_equal_to_0.25 true           1.00    561.4±4.86µs        ? ?/sec     1.03    578.6±2.59µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/vectorized_equal_to_0.5 true            1.00    959.3±3.12µs        ? ?/sec     1.00    959.2±4.47µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/vectorized_equal_to_0.75 true           1.00   1063.5±3.92µs        ? ?/sec     1.00   1065.8±3.44µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.1_size_100000/vectorized_equal_to_all_true            1.00   1181.4±3.74µs        ? ?/sec     1.00   1186.5±4.40µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/append_val                                1.00     13.5±0.11µs        ? ?/sec     1.01     13.6±0.09µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/vectorized_append                         1.04     17.0±0.15µs        ? ?/sec     1.00     16.3±0.08µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/vectorized_equal_to_0.25 true             1.02   1959.9±5.89ns        ? ?/sec     1.00   1921.0±6.49ns        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/vectorized_equal_to_0.5 true              1.01      3.5±0.01µs        ? ?/sec     1.00      3.5±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/vectorized_equal_to_0.75 true             1.01      5.2±0.01µs        ? ?/sec     1.00      5.1±0.01µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_1000/vectorized_equal_to_all_true              1.00      6.6±0.03µs        ? ?/sec     1.07      7.1±0.03µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/append_val                               1.00    186.6±4.68µs        ? ?/sec     1.00    187.0±0.75µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/vectorized_append                        1.00    202.7±0.46µs        ? ?/sec     1.01    205.2±0.68µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/vectorized_equal_to_0.25 true            1.00     43.1±0.38µs        ? ?/sec     1.08     46.5±0.45µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/vectorized_equal_to_0.5 true             1.00     85.7±0.75µs        ? ?/sec     1.05     89.8±0.73µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/vectorized_equal_to_0.75 true            1.00     97.2±0.54µs        ? ?/sec     1.06    103.1±1.38µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_10000/vectorized_equal_to_all_true             1.00    103.6±0.40µs        ? ?/sec     1.06    110.0±0.45µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/append_val                              1.01   1937.6±7.25µs        ? ?/sec     1.00   1926.4±7.92µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/vectorized_append                       1.00      2.1±0.01ms        ? ?/sec     1.01      2.1±0.01ms        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/vectorized_equal_to_0.25 true           1.00    567.5±3.48µs        ? ?/sec     1.03    586.0±2.51µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/vectorized_equal_to_0.5 true            1.00   1002.3±3.82µs        ? ?/sec     1.00   1002.2±4.93µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/vectorized_equal_to_0.75 true           1.00  1103.7±18.07µs        ? ?/sec     1.01   1119.3±3.31µs        ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/scenario_null_0.5_size_100000/vectorized_equal_to_all_true            1.00   1163.2±5.85µs        ? ?/sec     1.03   1194.0±3.84µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/append_val                         1.00      3.6±0.01µs        ? ?/sec     1.26      4.5±0.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/vectorized_append                  1.00      2.1±0.00µs        ? ?/sec     1.00      2.1±0.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/vectorized_equal_to_0.25 true      1.23   868.4±42.82ns        ? ?/sec     1.00    705.4±2.41ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/vectorized_equal_to_0.5 true       1.42   999.4±34.20ns        ? ?/sec     1.00    704.4±0.88ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/vectorized_equal_to_0.75 true      2.06  1455.8±31.09ns        ? ?/sec     1.00    705.7±2.72ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_1000/vectorized_equal_to_all_true       1.97   1387.0±3.29ns        ? ?/sec     1.00    704.6±1.77ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/append_val                        1.00     32.1±0.13µs        ? ?/sec     1.31     42.1±0.14µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/vectorized_append                 1.01     17.0±0.12µs        ? ?/sec     1.00     16.9±0.13µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/vectorized_equal_to_0.25 true     3.37     23.8±0.06µs        ? ?/sec     1.00      7.1±0.08µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/vectorized_equal_to_0.5 true      5.00     35.4±0.09µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/vectorized_equal_to_0.75 true     3.70     26.2±0.05µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_10000/vectorized_equal_to_all_true      1.92     13.6±0.03µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/append_val                       1.00    310.4±0.71µs        ? ?/sec     1.37    424.6±1.29µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/vectorized_append                1.00    160.2±0.61µs        ? ?/sec     1.09    173.9±1.17µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/vectorized_equal_to_0.25 true    3.61    267.4±0.79µs        ? ?/sec     1.00     74.2±1.37µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/vectorized_equal_to_0.5 true     5.65    411.0±0.84µs        ? ?/sec     1.00     72.7±0.50µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/vectorized_equal_to_0.75 true    4.02    292.4±0.93µs        ? ?/sec     1.00     72.7±0.46µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_false_size_100000/vectorized_equal_to_all_true     1.90    138.6±0.38µs        ? ?/sec     1.00     73.0±0.58µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/append_val                          1.04      7.2±0.07µs        ? ?/sec     1.00      6.9±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/vectorized_append                   1.00      2.2±0.00µs        ? ?/sec     1.02      2.2±0.07µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/vectorized_equal_to_0.25 true       1.11    785.0±7.88ns        ? ?/sec     1.00    706.9±1.20ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/vectorized_equal_to_0.5 true        1.46  1034.7±10.20ns        ? ?/sec     1.00    706.8±1.02ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/vectorized_equal_to_0.75 true       1.93  1366.4±15.36ns        ? ?/sec     1.00    706.6±0.75ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_1000/vectorized_equal_to_all_true        2.34  1650.0±10.52ns        ? ?/sec     1.00    705.8±1.11ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/append_val                         1.05     67.5±0.18µs        ? ?/sec     1.00     64.5±0.17µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/vectorized_append                  1.00     18.6±0.11µs        ? ?/sec     1.00     18.6±0.28µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/vectorized_equal_to_0.25 true      3.70     26.2±0.16µs        ? ?/sec     1.00      7.1±0.04µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/vectorized_equal_to_0.5 true       5.67     40.1±0.18µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/vectorized_equal_to_0.75 true      4.08     28.9±0.14µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_10000/vectorized_equal_to_all_true       2.31     16.3±0.02µs        ? ?/sec     1.00      7.1±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/append_val                        1.04    673.4±5.10µs        ? ?/sec     1.00    647.5±4.55µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/vectorized_append                 1.00    176.2±0.76µs        ? ?/sec     1.08    190.6±2.52µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/vectorized_equal_to_0.25 true     3.98    289.2±2.11µs        ? ?/sec     1.00     72.7±0.31µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/vectorized_equal_to_0.5 true      6.10    444.8±0.97µs        ? ?/sec     1.00     72.9±1.19µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/vectorized_equal_to_0.75 true     4.37    327.1±1.65µs        ? ?/sec     1.00     74.8±0.82µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.0_nullable_true_size_100000/vectorized_equal_to_all_true      2.28    165.6±0.54µs        ? ?/sec     1.00     72.8±0.53µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/append_val                          1.02      8.5±0.01µs        ? ?/sec     1.00      8.3±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/vectorized_append                   1.00      7.7±0.03µs        ? ?/sec     1.00      7.7±0.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/vectorized_equal_to_0.25 true       1.00   1102.2±6.86ns        ? ?/sec     1.16  1278.7±46.97ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/vectorized_equal_to_0.5 true        1.00  1890.5±20.06ns        ? ?/sec     1.07      2.0±0.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/vectorized_equal_to_0.75 true       1.00      2.7±0.01µs        ? ?/sec     1.03      2.8±0.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_1000/vectorized_equal_to_all_true        1.00      3.6±0.01µs        ? ?/sec     1.02      3.7±0.00µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/append_val                         1.02     83.0±0.18µs        ? ?/sec     1.00     81.2±0.14µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/vectorized_append                  1.00     75.8±0.21µs        ? ?/sec     1.08     82.1±0.18µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/vectorized_equal_to_0.25 true      1.04     28.2±0.08µs        ? ?/sec     1.00     27.0±0.13µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/vectorized_equal_to_0.5 true       1.00     52.0±0.23µs        ? ?/sec     1.02     53.0±0.26µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/vectorized_equal_to_0.75 true      1.00     46.3±0.24µs        ? ?/sec     1.01     46.8±0.27µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_10000/vectorized_equal_to_all_true       1.00     40.2±0.21µs        ? ?/sec     1.02     41.2±0.10µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/append_val                        1.00    824.8±1.78µs        ? ?/sec     1.00    822.6±1.81µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/vectorized_append                 1.00    759.0±1.71µs        ? ?/sec     1.02    777.6±6.08µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/vectorized_equal_to_0.25 true     1.00    346.0±1.50µs        ? ?/sec     1.01    348.3±0.80µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/vectorized_equal_to_0.5 true      1.00    579.8±1.14µs        ? ?/sec     1.02    589.5±1.03µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/vectorized_equal_to_0.75 true     1.00    521.2±0.97µs        ? ?/sec     1.01    527.5±2.07µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.1_nullable_true_size_100000/vectorized_equal_to_all_true      1.00    423.5±1.02µs        ? ?/sec     1.01    426.4±1.21µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/append_val                          1.00      7.8±0.06µs        ? ?/sec     1.05      8.2±0.06µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/vectorized_append                   1.02      8.4±0.06µs        ? ?/sec     1.00      8.2±0.05µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/vectorized_equal_to_0.25 true       1.00  1029.3±10.48ns        ? ?/sec     1.17   1203.0±8.82ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/vectorized_equal_to_0.5 true        1.00  1715.6±12.43ns        ? ?/sec     1.09  1873.0±20.02ns        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/vectorized_equal_to_0.75 true       1.00      2.4±0.04µs        ? ?/sec     1.05      2.5±0.02µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_1000/vectorized_equal_to_all_true        1.00      3.1±0.01µs        ? ?/sec     1.06      3.2±0.04µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/append_val                         1.00    114.5±0.27µs        ? ?/sec     1.02    116.6±0.35µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/vectorized_append                  1.00    121.6±0.27µs        ? ?/sec     1.10    134.0±0.39µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/vectorized_equal_to_0.25 true      1.01     32.9±0.13µs        ? ?/sec     1.00     32.5±0.13µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/vectorized_equal_to_0.5 true       1.00     61.9±0.36µs        ? ?/sec     1.00     61.9±0.49µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/vectorized_equal_to_0.75 true      1.00     62.8±0.36µs        ? ?/sec     1.01     63.5±0.38µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_10000/vectorized_equal_to_all_true       1.00     59.2±0.50µs        ? ?/sec     1.01     59.7±0.25µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/append_val                        1.00   1166.9±8.76µs        ? ?/sec     1.02   1186.9±2.08µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/vectorized_append                 1.00  1249.2±30.31µs        ? ?/sec     1.01   1261.0±6.95µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/vectorized_equal_to_0.25 true     1.00    397.4±1.57µs        ? ?/sec     1.01    402.1±0.96µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/vectorized_equal_to_0.5 true      1.00    677.9±1.53µs        ? ?/sec     1.01    682.5±1.56µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/vectorized_equal_to_0.75 true     1.00    681.7±1.61µs        ? ?/sec     1.01    689.3±1.01µs        ? ?/sec
PrimitiveGroupValueBuilder_vectorized_append/null_0.5_nullable_true_size_100000/vectorized_equal_to_all_true      1.00    667.7±2.25µs        ? ?/sec     1.01    671.2±1.33µs        ? ?/sec

@rluvaton
Copy link
Member Author

rluvaton commented Nov 13, 2025

As suspected up to 6 times faster:

From @alamb results this is the parsed as table:

all below are for PrimitiveGroupValueBuilder vectorize_equal_to (note that there are sometimes some regression for some reason in some other function)

The good thing you will notice below is that regardless of the number of true count the results are consistent - meaning it really not using branches

and also, the null case which I did not optimize is roughly the same

Table
null nullable size equal_to_results main (ratio) main (time) optimize (ratio) optimize (time)
0.0 false 1000 0.25 true 1.23 868.4±42.82ns 1.00 705.4±2.41ns
0.0 false 1000 0.5 true 1.42 999.4±34.20ns 1.00 704.4±0.88ns
0.0 false 1000 0.75 true 2.06 1455.8±31.09ns 1.00 705.7±2.72ns
0.0 false 1000 all_true 1.97 1387.0±3.29ns 1.00 704.6±1.77ns
0.0 false 10000 0.25 true 3.37 23.8±0.06µs 1.00 7.1±0.08µs
0.0 false 10000 0.5 true 5.00 35.4±0.09µs 1.00 7.1±0.02µs
0.0 false 10000 0.75 true 3.70 26.2±0.05µs 1.00 7.1±0.02µs
0.0 false 10000 all_true 1.92 13.6±0.03µs 1.00 7.1±0.02µs
0.0 false 100000 0.25 true 3.61 267.4±0.79µs 1.00 74.2±1.37µs
0.0 false 100000 0.5 true 5.65 411.0±0.84µs 1.00 72.7±0.50µs
0.0 false 100000 0.75 true 4.02 292.4±0.93µs 1.00 72.7±0.46µs
0.0 false 100000 all_true 1.90 138.6±0.38µs 1.00 73.0±0.58µs
0.0 true 1000 0.25 true 1.11 785.0±7.88ns 1.00 706.9±1.20ns
0.0 true 1000 0.5 true 1.46 1034.7±10.20ns 1.00 706.8±1.02ns
0.0 true 1000 0.75 true 1.93 1366.4±15.36ns 1.00 706.6±0.75ns
0.0 true 1000 all_true 2.34 1650.0±10.52ns 1.00 705.8±1.11ns
0.0 true 10000 0.25 true 3.70 26.2±0.16µs 1.00 7.1±0.04µs
0.0 true 10000 0.5 true 5.67 40.1±0.18µs 1.00 7.1±0.02µs
0.0 true 10000 0.75 true 4.08 28.9±0.14µs 1.00 7.1±0.02µs
0.0 true 10000 all_true 2.31 16.3±0.02µs 1.00 7.1±0.02µs
0.0 true 100000 0.25 true 3.98 289.2±2.11µs 1.00 72.7±0.31µs
0.0 true 100000 0.5 true 6.10 444.8±0.97µs 1.00 72.9±1.19µs
0.0 true 100000 0.75 true 4.37 327.1±1.65µs 1.00 74.8±0.82µs
0.0 true 100000 all_true 2.28 165.6±0.54µs 1.00 72.8±0.53µs
0.1 true 1000 0.25 true 1.00 1102.2±6.86ns 1.16 1278.7±46.97ns
0.1 true 1000 0.5 true 1.00 1890.5±20.06ns 1.07 2.0±0.01µs
0.1 true 1000 0.75 true 1.00 2.7±0.01µs 1.03 2.8±0.01µs
0.1 true 1000 all_true 1.00 3.6±0.01µs 1.02 3.7±0.00µs
0.1 true 10000 0.25 true 1.04 28.2±0.08µs 1.00 27.0±0.13µs
0.1 true 10000 0.5 true 1.00 52.0±0.23µs 1.02 53.0±0.26µs
0.1 true 10000 0.75 true 1.00 46.3±0.24µs 1.01 46.8±0.27µs
0.1 true 10000 all_true 1.00 40.2±0.21µs 1.02 41.2±0.10µs
0.1 true 100000 0.25 true 1.00 346.0±1.50µs 1.01 348.3±0.80µs
0.1 true 100000 0.5 true 1.00 579.8±1.14µs 1.02 589.5±1.03µs
0.1 true 100000 0.75 true 1.00 521.2±0.97µs 1.01 527.5±2.07µs
0.1 true 100000 all_true 1.00 423.5±1.02µs 1.01 426.4±1.21µs
0.5 true 1000 0.25 true 1.00 1029.3±10.48ns 1.17 1203.0±8.82ns
0.5 true 1000 0.5 true 1.00 1715.6±12.43ns 1.09 1873.0±20.02ns
0.5 true 1000 0.75 true 1.00 2.4±0.04µs 1.05 2.5±0.02µs
0.5 true 1000 all_true 1.00 3.1±0.01µs 1.06 3.2±0.04µs
0.5 true 10000 0.25 true 1.01 32.9±0.13µs 1.00 32.5±0.13µs
0.5 true 10000 0.5 true 1.00 61.9±0.36µs 1.00 61.9±0.49µs
0.5 true 10000 0.75 true 1.00 62.8±0.36µs 1.01 63.5±0.38µs
0.5 true 10000 all_true 1.00 59.2±0.50µs 1.01 59.7±0.25µs
0.5 true 100000 0.25 true 1.00 397.4±1.57µs 1.01 402.1±0.96µs
0.5 true 100000 0.5 true 1.00 677.9±1.53µs 1.01 682.5±1.56µs
0.5 true 100000 0.75 true 1.00 681.7±1.61µs 1.01 689.3±1.01µs
0.5 true 100000 all_true 1.00 667.7±2.25µs 1.01 671.2±1.33µs

@rluvaton
Copy link
Member Author

once we have mutable bit packed buffer, we will also evaluate the nulls separately and do bit mask operation on the compare result to make that case faster as well

@alamb alamb added this pull request to the merge queue Nov 17, 2025
@alamb
Copy link
Contributor

alamb commented Nov 17, 2025

Thanks again @rluvaton

Merged via the queue into apache:main with commit af22336 Nov 17, 2025
32 checks passed
@rluvaton rluvaton deleted the optimize-primitive-multi-group-by-to-use-simd branch November 17, 2025 18:00
logan-keede pushed a commit to logan-keede/datafusion that referenced this pull request Nov 23, 2025
…pValueBuilder` in multi group by aggregation (apache#17977)

## Which issue does this PR close?

N/A

## Rationale for this change

Making multi column aggregation even faster

## What changes are included in this PR?

In `PrimitiveGroupValueBuilder.vectorized_equal_to` always evaluate and
use unchecked as both of these changes are what making the code compile
to SIMD.

## Are these changes tested?

Existing tests

## Are there any user-facing changes?

Nope

-----

I tried a LOT of variations [GodBolt](https://godbolt.org/z/Kc8ze6E9n)
from splitting to fixed size chunks and trying to get auto-vectorization
to use gather and creating bitmask to even testing portable SIMD (just
to see what it will generate).

this version only optimize the non null path for the moment as it is the
easiest.

once and if we change from `&mut [bool]` to mutable packed bits we
could:
1. evaluate in chunks of `64` items (I tried different variations to see
what is the best - you can tweak in the godbolt above with different
type and size to check for yourself), 64 is not necessarily the best but
it will be the fastest I think for doing AND with the `equal_to_results`
boolean buffer
2. add optimization for nullable as well by just doing bitwise operation
at 64 items at a time and avoid the cost of getting each bit manually
3. skip 64 items right away if the the `equal_to_results` equal to
`0x00` (i.e. all false)

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants