Skip to content

Conversation

@pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Oct 28, 2025

Which issue does this PR close?

Rationale for this change

When CaseExpr needs to evaluate a PhysicalExpr for a subset of the rows of the input RecordBatch it will first filter the record batch using a selection vector. This filter steps filters all columns of the RecordBatch, including ones that may not be accessed by the PhysicalExpr. For wide (many columns) record batches and narrow expressions (few column references) it can be beneficial to project the record batch first to reduce the amount of wasted filtering work.

What changes are included in this PR?

This PR attempts to reduce the amount of time spent filtering columns unnecessarily by reducing the columns of the record batch prior to filtering. Since this renumbers the columns, it is also required to derive new versions of the when, then, and else expressions that have corrected column references.

To make this more manageable the set of child expressions of a case expression are collected in a new struct named CaseBody. The projection logic derives a projection vector and a projected CaseBody.

This logic is only used when the number of used columns (the length of the projection vector) is less than the number of columns of the incoming record batch.

Certain evaluation methods in case do not perform any filtering. These remain unchanged and will never perform the projection logic since this is only beneficial when filtering of record batches is required.

Are these changes tested?

  • Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 28, 2025
Comment on lines +92 to +112
let mut used_column_indices = HashSet::<usize>::new();
let mut collect_column_indices = |expr: &Arc<dyn PhysicalExpr>| {
expr.apply(|expr| {
if let Some(column) = expr.as_any().downcast_ref::<Column>() {
used_column_indices.insert(column.index());
}
Ok(TreeNodeRecursion::Continue)
})
.expect("Closure cannot fail");
};

if let Some(e) = &self.expr {
collect_column_indices(e);
}
self.when_then_expr.iter().for_each(|(w, t)| {
collect_column_indices(w);
collect_column_indices(t);
});
if let Some(e) = &self.else_expr {
collect_column_indices(e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in projection.rs we have something similar (not pub though)

/// Collect all column indices from the given projection expressions.
fn collect_column_indices(exprs: &[ProjectionExpr]) -> Vec<usize> {
// Collect indices and remove duplicates.
let mut indices = exprs
.iter()
.flat_map(|proj_expr| collect_columns(&proj_expr.expr))
.map(|x| x.index())
.collect::<std::collections::HashSet<_>>()
.into_iter()
.collect::<Vec<_>>();
indices.sort();
indices
}

Copy link
Contributor

@adriangb adriangb Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend you use ProjectionExprs from https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/projection.rs, if there's manipulations that you think might be duplicated elsewhere or useful to have abstracted feel free to propose adding them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rluvaton I don't think I'll be able to access code in physical_plan since that would create a dependency loop. Should I turn this into a public util function in physical_expr and use that from physical_plan?

@adriangb It's not clear to me how I could make use of ProjectionExprs. update_expr is I think the closest to what this code is trying to do, but what I need is a very limited version of it where I'm only renumbering columns.

Copy link
Contributor

@adriangb adriangb Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's ProjectionExprs::column_indices which is pub and similar to the non-pub collect_column_indices referenced above. I haven't reviewed this PR in detail but there may be other helper bits that you can use and generally it would be nice if we coalesce projection manipulation into ProjectionExprs because I feel like there's a lot of duplicate code in random places right now (obviously needs to be balanced with keeping the API surface area on ProjectionExprs reasonable).

/// Extract the column indices used in this projection.
/// For example, for a projection `SELECT a AS x, b + 1 AS y`, where `a` is at index 0 and `b` is at index 1,
/// this function would return `[0, 1]`.
/// Repeated indices are returned only once, and the order is ascending.
pub fn column_indices(&self) -> Vec<usize> {
self.exprs
.iter()
.flat_map(|e| collect_columns(&e.expr).into_iter().map(|col| col.index()))
.sorted_unstable()
.dedup()
.collect_vec()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. I haven't reviewed this PR in detail but there may be other helper bits that you can use and generally it would be nice if we coalesce projection manipulation into ProjectionExprs because I feel like there's a lot of duplicate code in random places right now (obviously needs to be balanced with keeping the API surface area on ProjectionExprs reasonable).

I also agree it feels like there is lots of random remapping code floating around

However, that being said it is not a problem this PR introduces (though it may make it slightly worse)

@rluvaton
Copy link
Member

Can you please add tests for each eval method and when used all columns and not

also check when there are duplicate columns, columns with different name but same index and so on

and can you please provide benchmark results

pepijnve and others added 3 commits October 28, 2025 17:53
Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 28, 2025
@pepijnve
Copy link
Contributor Author

@rluvaton I've added some SLTs which cover what you asked for

  • duplicate columns: I don't think you can actually have columns with duplicate names. Those need to be aliased. The names don't really play a role at this point anymore either.
  • columns with different name but same index: I've covered that in the SLT
  • and so on: not sure what else you have in mind. I would hope the sqlite test set covers all kinds of other crazy stuff

@pepijnve
Copy link
Contributor Author

Benchmark results so far. I'll do another run with all the lookup table ones, but those take much longer to complete.

case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END
                        time:   [44.119 µs 44.159 µs 44.201 µs]
                        change: [-1.3187% -1.0059% -0.6888%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END
                        time:   [14.092 µs 14.139 µs 14.183 µs]
                        change: [-0.4325% +0.0691% +0.5941%] (p = 0.80 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END
                        time:   [1.6137 µs 1.6248 µs 1.6454 µs]
                        change: [-0.3208% +0.2510% +0.9935%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END
                        time:   [15.638 µs 15.706 µs 15.777 µs]
                        change: [-1.6629% -0.4705% +0.4682%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 EN...: Collecting 100 samples in estimated 5.6881 s (300 iteracase_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 EN...
                        time:   [18.497 ms 18.584 ms 18.672 ms]
                        change: [-63.275% -63.002% -62.739%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n...: Collecting 100 samples in estimated 5.1163 s (66k iteracase_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n...
                        time:   [77.671 µs 77.743 µs 77.814 µs]
                        change: [-32.757% -31.951% -31.340%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

case_when 8192x3: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END
                        time:   [25.772 ms 25.896 ms 26.027 ms]
                        change: [-59.464% -59.166% -58.882%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

case_when 8192x3: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END
                        time:   [80.644 µs 80.829 µs 81.013 µs]
                        change: [-29.245% -29.042% -28.841%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END
                        time:   [44.208 µs 44.255 µs 44.306 µs]
                        change: [-6.2334% -4.3157% -2.6620%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END
                        time:   [12.950 µs 13.125 µs 13.308 µs]
                        change: [-77.208% -76.935% -76.643%] (p = 0.00 < 0.05)
                        Performance has improved.

case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END
                        time:   [1.6336 µs 1.6372 µs 1.6413 µs]
                        change: [+0.5128% +0.8175% +1.1256%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END
                        time:   [15.829 µs 15.917 µs 16.039 µs]
                        change: [-74.636% -74.248% -73.752%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 E...: Collecting 100 samples in estimated 5.6242 s (300 iteracase_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 E...
                        time:   [18.634 ms 18.835 ms 19.124 ms]
                        change: [-93.215% -93.103% -92.966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE ...: Collecting 100 samples in estimated 5.1051 s (66k iteracase_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE ...
                        time:   [78.047 µs 78.193 µs 78.340 µs]
                        change: [-84.852% -84.791% -84.733%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END
                        time:   [26.130 ms 26.260 ms 26.395 ms]
                        change: [-91.275% -91.142% -91.017%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

case_when 8192x50: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END
                        time:   [79.469 µs 79.961 µs 80.443 µs]
                        change: [-84.462% -84.371% -84.290%] (p = 0.00 < 0.05)
                        Performance has improved.

case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END
                        time:   [44.229 µs 44.282 µs 44.339 µs]
                        change: [-0.2623% -0.0773% +0.1124%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END
                        time:   [12.831 µs 13.058 µs 13.281 µs]
                        change: [-88.649% -88.422% -88.188%] (p = 0.00 < 0.05)
                        Performance has improved.

case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END
                        time:   [1.6280 µs 1.6328 µs 1.6379 µs]
                        change: [+0.3549% +0.6314% +0.9178%] (p = 0.00 < 0.05)
                        Change within noise threshold.

case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END
                        time:   [15.816 µs 15.874 µs 15.926 µs]
                        change: [-86.013% -85.925% -85.845%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild

Benchmarking case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 ...: Collecting 100 samples in estimated 5.6208 s (300 iteracase_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 ...
                        time:   [18.786 ms 18.899 ms 19.039 ms]
                        change: [-96.725% -96.693% -96.662%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE...: Collecting 100 samples in estimated 5.1589 s (66k iteracase_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE...
                        time:   [77.981 µs 78.081 µs 78.187 µs]
                        change: [-91.952% -91.871% -91.800%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high severe

case_when 8192x100: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END
                        time:   [25.685 ms 25.783 ms 25.887 ms]
                        change: [-95.974% -95.893% -95.817%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild

Benchmarking case_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END: Collecting 100 samples in estimated 5.3041 s (66k iterationscase_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END
                        time:   [79.641 µs 79.901 µs 80.183 µs]
                        change: [-91.560% -91.513% -91.470%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing projected_case (7c71790) to e9431fc diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=projected_case
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 29, 2025

🤖: Benchmark completed

Details

group                                                                                                                             main                                   projected_case
-----                                                                                                                             ----                                   --------------
case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                    19.61     2.7±0.02ms        ? ?/sec    1.00    137.6±1.66µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                         1.00     55.7±0.10µs        ? ?/sec    1.00     55.4±0.19µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                       15.83   383.2±9.95µs        ? ?/sec    1.00     24.2±0.54µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                   1.01      6.8±0.02µs        ? ?/sec    1.00      6.8±0.02µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                           34.74 1673.0±10.59ms        ? ?/sec    1.00     48.2±0.20ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                          32.32 1698.5±10.39ms        ? ?/sec    1.00     52.6±0.30ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                     13.28   395.9±9.86µs        ? ?/sec    1.00     29.8±0.50µs        ? ?/sec
case_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                22.73     2.7±0.02ms        ? ?/sec    1.00    120.6±1.13µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                      1.51    207.9±2.19µs        ? ?/sec    1.00    137.4±1.64µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                           1.01     55.5±0.17µs        ? ?/sec    1.00     55.0±0.12µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                         1.00     23.2±0.49µs        ? ?/sec    1.01     23.6±0.45µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                     1.00      6.8±0.02µs        ? ?/sec    1.01      6.8±0.02µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                             1.41     68.3±0.29ms        ? ?/sec    1.00     48.3±0.29ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                            1.38     73.3±0.47ms        ? ?/sec    1.00     53.0±0.37ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                       1.01     29.9±0.71µs        ? ?/sec    1.00     29.8±0.57µs        ? ?/sec
case_when 8192x3: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                  1.58    190.7±1.74µs        ? ?/sec    1.00    120.5±1.56µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                     9.95   1374.8±9.67µs        ? ?/sec    1.00    138.2±1.63µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                          1.01     55.6±0.19µs        ? ?/sec    1.00     55.1±0.12µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                        7.37    177.8±4.35µs        ? ?/sec    1.00     24.1±0.48µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                    1.00      6.8±0.02µs        ? ?/sec    1.00      6.8±0.01µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                            15.53   750.3±5.06ms        ? ?/sec    1.00     48.3±0.23ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                           14.49   764.2±5.57ms        ? ?/sec    1.00     52.8±0.30ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                      6.23    183.3±4.96µs        ? ?/sec    1.00     29.4±0.64µs        ? ?/sec
case_when 8192x50: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                 11.37  1373.7±9.49µs        ? ?/sec    1.00    120.8±1.22µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.03    475.3±2.02µs        ? ?/sec    1.00    459.4±2.24µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.03    530.2±3.21µs        ? ?/sec    1.00    516.5±1.50µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    405.4±2.21µs        ? ?/sec    1.00    402.5±1.24µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.02    470.8±1.22µs        ? ?/sec    1.00    463.6±3.65µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.01    525.9±1.84µs        ? ?/sec    1.00    520.0±2.40µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.00    405.2±2.12µs        ? ?/sec    1.00    404.4±2.55µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.02    472.3±2.60µs        ? ?/sec    1.00    464.0±1.65µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.02    526.0±1.90µs        ? ?/sec    1.00    517.2±4.05µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.02    472.7±1.75µs        ? ?/sec    1.00    464.5±3.33µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    196.1±1.49µs        ? ?/sec    1.00    194.2±0.45µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.01    275.5±1.39µs        ? ?/sec    1.00    273.5±0.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.01    261.8±0.85µs        ? ?/sec    1.00    259.8±1.34µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    195.5±0.37µs        ? ?/sec    1.00    194.8±0.77µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    274.9±0.91µs        ? ?/sec    1.00    274.0±0.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.00    261.3±0.60µs        ? ?/sec    1.00    260.4±1.02µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.01    195.9±0.91µs        ? ?/sec    1.00    194.7±0.46µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.01    275.8±1.33µs        ? ?/sec    1.00    274.2±0.83µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    195.7±0.56µs        ? ?/sec    1.00    195.0±0.43µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.05    644.1±2.41µs        ? ?/sec    1.00    616.1±1.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.04    688.8±3.69µs        ? ?/sec    1.00    660.8±2.61µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.03    511.2±7.70µs        ? ?/sec    1.00    498.5±1.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.04    642.0±3.67µs        ? ?/sec    1.00    619.1±1.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.03    685.5±2.54µs        ? ?/sec    1.00    663.3±4.43µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.02    511.0±2.55µs        ? ?/sec    1.00    501.3±2.51µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.03    639.8±3.19µs        ? ?/sec    1.00    618.6±5.47µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.04    685.1±3.31µs        ? ?/sec    1.00    661.2±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.04    642.1±3.73µs        ? ?/sec    1.00    616.2±3.76µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.00    195.4±0.47µs        ? ?/sec    1.00    194.9±0.61µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.01    275.9±1.24µs        ? ?/sec    1.00    273.0±0.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.00    261.3±1.58µs        ? ?/sec    1.00    260.2±1.03µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    195.6±0.59µs        ? ?/sec    1.00    195.2±2.38µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    275.3±1.19µs        ? ?/sec    1.00    274.7±2.44µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.00    261.6±2.74µs        ? ?/sec    1.00    260.6±1.21µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    195.6±0.83µs        ? ?/sec    1.00    195.0±0.48µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    275.0±0.90µs        ? ?/sec    1.00    274.5±1.73µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    195.9±0.64µs        ? ?/sec    1.00    195.8±0.43µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.02    338.7±1.59µs        ? ?/sec    1.00    332.4±0.91µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.01    399.6±1.52µs        ? ?/sec    1.00    396.4±1.26µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.01    326.9±1.62µs        ? ?/sec    1.00    325.0±4.86µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.02    338.5±1.41µs        ? ?/sec    1.00    332.5±1.27µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.01    399.5±1.54µs        ? ?/sec    1.00    396.9±1.30µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.00    325.4±1.04µs        ? ?/sec    1.00    325.2±1.55µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.02    338.4±2.00µs        ? ?/sec    1.00    333.0±1.17µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.01    399.4±1.52µs        ? ?/sec    1.00    396.0±0.88µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.03   344.0±21.42µs        ? ?/sec    1.00    333.4±1.47µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.01    195.8±0.43µs        ? ?/sec    1.00    194.2±0.66µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.01    276.0±1.14µs        ? ?/sec    1.00    273.9±3.04µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.01    261.8±1.15µs        ? ?/sec    1.00    259.5±1.00µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.01    195.8±1.71µs        ? ?/sec    1.00    194.5±0.42µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.01    275.6±0.82µs        ? ?/sec    1.00    273.6±1.79µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.00    261.5±0.80µs        ? ?/sec    1.00    261.0±1.16µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.01    195.8±0.62µs        ? ?/sec    1.00    194.9±0.65µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.00    275.5±1.09µs        ? ?/sec    1.00    275.1±1.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.01    196.3±0.63µs        ? ?/sec    1.00    195.3±0.65µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00    850.5±1.95µs        ? ?/sec    1.01    858.1±1.92µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    902.8±3.60µs        ? ?/sec    1.01    910.6±1.45µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.00    636.8±7.77µs        ? ?/sec    1.02    648.5±1.66µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00    850.6±1.99µs        ? ?/sec    1.01    858.7±8.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00    901.9±2.27µs        ? ?/sec    1.01    912.6±2.35µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.00    634.9±1.85µs        ? ?/sec    1.02    648.5±1.86µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.00    850.4±2.55µs        ? ?/sec    1.01    859.1±3.68µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.00    902.6±2.49µs        ? ?/sec    1.01    910.3±4.27µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.00    849.5±1.90µs        ? ?/sec    1.01    858.8±5.66µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.00    230.2±1.37µs        ? ?/sec    1.00    230.1±0.51µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.01   343.3±27.93µs        ? ?/sec    1.00   341.2±15.74µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.00    316.8±2.00µs        ? ?/sec    1.02    321.7±5.32µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    230.3±1.62µs        ? ?/sec    1.00    230.2±0.63µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    336.8±1.03µs        ? ?/sec    1.01    340.0±2.57µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.00    317.6±1.28µs        ? ?/sec    1.01    322.0±1.04µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    230.4±0.59µs        ? ?/sec    1.00    229.8±0.76µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    336.8±1.25µs        ? ?/sec    1.01    340.2±1.41µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    230.2±0.57µs        ? ?/sec    1.00    230.3±0.64µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00   1210.6±5.03µs        ? ?/sec    1.02   1229.5±6.49µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00   1212.7±6.94µs        ? ?/sec    1.00   1218.4±2.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.00    833.7±2.05µs        ? ?/sec    1.01    846.0±1.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00  1212.0±12.13µs        ? ?/sec    1.01   1228.0±7.01µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00   1201.1±3.71µs        ? ?/sec    1.02   1219.5±6.77µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.00    834.6±3.98µs        ? ?/sec    1.02    848.1±3.33µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.00   1209.7±3.32µs        ? ?/sec    1.02   1228.0±3.80µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.00   1200.2±3.96µs        ? ?/sec    1.01   1218.1±3.17µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.00   1213.2±6.32µs        ? ?/sec    1.01   1226.5±2.48µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.00    230.2±0.79µs        ? ?/sec    1.00    230.0±0.43µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    336.4±1.44µs        ? ?/sec    1.00    337.8±0.91µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.00    317.8±2.80µs        ? ?/sec    1.01    321.5±2.43µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    230.4±0.49µs        ? ?/sec    1.00    230.9±0.99µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    336.8±1.02µs        ? ?/sec    1.01    339.5±2.20µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.00    318.7±3.78µs        ? ?/sec    1.01    322.4±2.85µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    230.2±0.52µs        ? ?/sec    1.00    230.6±2.64µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    336.3±0.83µs        ? ?/sec    1.01    339.2±1.87µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    230.4±0.52µs        ? ?/sec    1.00    229.9±0.39µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.00    548.6±1.34µs        ? ?/sec    1.01    555.2±1.30µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.00    629.5±3.44µs        ? ?/sec    1.01    634.7±2.70µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.00    486.3±1.52µs        ? ?/sec    1.03    499.4±1.81µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.00    549.8±1.35µs        ? ?/sec    1.01    555.0±1.59µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.00    627.6±2.21µs        ? ?/sec    1.01    635.8±3.00µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.00    485.2±1.64µs        ? ?/sec    1.03    500.6±2.18µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.00    547.9±1.99µs        ? ?/sec    1.01    554.2±1.55µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.00    629.5±1.77µs        ? ?/sec    1.01    634.3±1.65µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.00    549.3±2.01µs        ? ?/sec    1.01    556.5±1.82µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.00    230.3±0.76µs        ? ?/sec    1.00    230.1±0.61µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.00    336.6±2.33µs        ? ?/sec    1.01    338.4±1.16µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.00    317.9±1.27µs        ? ?/sec    1.01    321.5±0.58µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.00    230.4±0.98µs        ? ?/sec    1.00    230.0±0.54µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.00    336.5±0.77µs        ? ?/sec    1.01    339.3±1.23µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.00    317.7±1.17µs        ? ?/sec    1.01    321.4±0.79µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.00    230.8±0.87µs        ? ?/sec    1.00    230.2±0.76µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.00    336.6±1.42µs        ? ?/sec    1.01    339.9±2.57µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.00    230.3±0.59µs        ? ?/sec    1.00    230.5±2.74µs        ? ?/sec

@pepijnve
Copy link
Contributor Author

pepijnve commented Oct 29, 2025

Benchmark results confirm, I think, the waste in filtering unused columns and that the gains can be significant.

I am still a bit worried about the heavy handedness of the approach I implemented here. Does the gain in execution speed cost too much during planning? Or is there maybe a simpler way to achieve the same end result?

The second version I made of this was more like an optimizer pass. Instead of narrowing the record batch inside the CaseExpr, a “project record batch” expr node was wrapped around it. That has the benefit of not needing the duplicate expr tree. The downside was that this becomes visible in the physical expr tree. Much less an internal implementation detail that way.
The approach with a wrapper PhysicalExpr can be seen in #18055

I was experimenting with an actual PhysicalOptimizerRule today, but I don't think that's going to be feasible without API changes. I might have missed it, but I don't think there's already API to swap out the expressions of a dyn ExecutionPlan. Something like expressions and with_new_expressions would be necessary.

@pepijnve
Copy link
Contributor Author

pepijnve commented Oct 29, 2025

I was experimenting with an actual PhysicalOptimizerRule today, but I don't think that's going to be feasible without API changes.

At the logical level this would be possible though, but there's no Expr that expresses "narrow the incoming record batch". Doesn't really make sense in the logical domain either.

Perhaps at the logical-to-physical translation point? Although you need insight into the internals of CaseExpr to know when it's going to be useful. If the EvalMethod will not perform any filtering, then this narrowing/project logic is a waste of CPU cycles. Going around in circles in my reasoning. This line of thinking is what led me to pull the logic entirely into CaseExpr and make it an internal implementation detail.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pepijnve -- in my opinion this PR could be merged as is and is a very nice refinement.

However, I think we could potentially simplify it / reduce some complexity by avoiding multiple copies of CaseExpr if possible -- I left some suggestions

Comment on lines +92 to +112
let mut used_column_indices = HashSet::<usize>::new();
let mut collect_column_indices = |expr: &Arc<dyn PhysicalExpr>| {
expr.apply(|expr| {
if let Some(column) = expr.as_any().downcast_ref::<Column>() {
used_column_indices.insert(column.index());
}
Ok(TreeNodeRecursion::Continue)
})
.expect("Closure cannot fail");
};

if let Some(e) = &self.expr {
collect_column_indices(e);
}
self.when_then_expr.iter().for_each(|(w, t)| {
collect_column_indices(w);
collect_column_indices(t);
});
if let Some(e) = &self.else_expr {
collect_column_indices(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. I haven't reviewed this PR in detail but there may be other helper bits that you can use and generally it would be nice if we coalesce projection manipulation into ProjectionExprs because I feel like there's a lot of duplicate code in random places right now (obviously needs to be balanced with keeping the API surface area on ProjectionExprs reasonable).

I also agree it feels like there is lots of random remapping code floating around

However, that being said it is not a problem this PR introduces (though it may make it slightly worse)

let mut result_builder = ResultBuilder::new(return_type, batch.num_rows());

// `remainder_rows` contains the indices of the rows that need to be evaluated
let mut remainder_rows: ArrayRef =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me som reading to understand the point of the PR is to remove unused columns carried along in remainder_rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been living in this little corner of DataFusion for too long already it seems 😄 The reason for this PR is indeed that:

  • we need to filter RecordBatch values
  • filtering a RecordBatch filters all columns of the batch
  • the when, then, and else expressions may only reference a few of the columns of the batch

Any time spent filtering the unreferenced columns is unnecessary work.

The comment you suggested will hopefully clarify that for future readers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other key thing to realize is that Case is being evaluated incrementally -- and any rows that have not yet matched one of the case expressions are carried (copied) through (rather than tracked with a bitmask for example)

Copy link
Contributor Author

@pepijnve pepijnve Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. What's easy to miss in the original code is that, even though it was tracking remaining rows with a bit mask, by calling PhysicalExpr#evaluate_selection you still pay the cost of carrying/copying because there are no true selection vector based conditional evaluation implementations. evaluate_selection just hides the filtering/scattering cost from sight.

This led me to wonder if evaluate_selection is actually useful. In DataFusion itself case is the only user of it and at the moment the only remaining usage is in expr_or_expr. Even there its usage is dubious. I did a little experiment in https://github.com/pepijnve/datafusion/tree/expr_or_expr where I avoid the 'scatter' cost using an unaligned zip implementation and get better performance that way.

Copy link
Contributor

@alamb alamb Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like another follow on PR 🤪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here it is #18444.

projected: &ProjectedCaseBody,
) -> Result<ColumnarValue> {
let return_type = self.data_type(&batch.schema())?;
if projected.projection.len() < batch.num_columns() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused about this check -- when would projection.len() be greater? When all the input columns are used?

If that is correct, can you please add some comments about that invariant in ProjectedCaseBody? Or maybe even better you could represent the idea of no projection more explicity

Perhaps soemthing like

pub struct CaseExpr {
    /// The case expression body
    body: CaseBody,
    /// Optional projection to apply
    projection: Option<Projection>,
    /// Evaluation method to use
    eval_method: EvalMethod,
}

Copy link
Contributor Author

@pepijnve pepijnve Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would projection.len() be greater?

It will never be greater (or shouldn't be at least), but may be equal when all the input columns are used indeed.

The reason this is necessary in the first place is because at construction time of the CaseExpr you're flying blind wrt the schema. If the set of used columns is for instance 0, 1, 2 there's no way to know if that's all of them or a prefix of the full schema. Unfortunately that necessitates this per evaluate check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb maybe this is another instance where some sort of "optimize/specialize/compile given the schema / children" would be helpful

/// [ELSE result]
/// END
NoExpression,
NoExpression(ProjectedCaseBody),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat concerned about the duplication here -- the EvalMethod has a CaseBody (embedded in ProjectedCaseBody) but the CaseExpr also (still) has a CaseBody -- could they get out of sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, in practice no. with_new_children calls CaseExpr::try_new which will derive a new projected version. So at least through the public API there's no way to get the two out of sync.

I wanted to avoid that case would silently rewrite the externally visible expressions. The projected variant may not match the schema of the execution plan anymore so I felt it would be a bad idea to let this leak out.

I need to keep the rewritten version somewhere in order to evaluate it so I ended up putting that in the EvalMethods that would actually make use of it.

That being said, I'm not entirely happy with this solution myself either. It's the best I could come up with that completely hides what's going on from the outside world.

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

I'll plan to merge this PR tomorrow unless anyone would like more time to comment

@alamb alamb changed the title Project record batches to avoid filtering unused columns Project record batches to avoid filtering unused columns in CASE evaluation Oct 31, 2025
@alamb alamb added this pull request to the merge queue Oct 31, 2025
@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

Thanks again @pepijnve -- this one is very exciting

Merged via the queue into apache:main with commit c3e49fb Oct 31, 2025
33 checks passed
tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
…aluation (apache#18329)

## Which issue does this PR close?

- Closes apache#18056
- Part of apache#18075

## Rationale for this change

When `CaseExpr` needs to evaluate a `PhysicalExpr` for a subset of the
rows of the input `RecordBatch` it will first filter the record batch
using a selection vector. This filter steps filters all columns of the
`RecordBatch`, including ones that may not be accessed by the
`PhysicalExpr`. For wide (many columns) record batches and narrow
expressions (few column references) it can be beneficial to project the
record batch first to reduce the amount of wasted filtering work.

## What changes are included in this PR?

This PR attempts to reduce the amount of time spent filtering columns
unnecessarily by reducing the columns of the record batch prior to
filtering. Since this renumbers the columns, it is also required to
derive new versions of the `when`, `then`, and `else` expressions that
have corrected column references.

To make this more manageable the set of child expressions of a `case`
expression are collected in a new struct named `CaseBody`. The
projection logic derives a projection vector and a projected `CaseBody`.

This logic is only used when the number of used columns (the length of
the projection vector) is less than the number of columns of the
incoming record batch.

Certain evaluation methods in `case` do not perform any filtering. These
remain unchanged and will never perform the projection logic since this
is only beneficial when filtering of record batches is required.

## Are these changes tested?

- Covered by existing tests

## Are there any user-facing changes?

No

---------

Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@pepijnve pepijnve deleted the projected_case branch November 3, 2025 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance of queries of the form SELECT *, CASE ... END

4 participants