Skip to content

Conversation

findepi
Copy link
Member

@findepi findepi commented Mar 24, 2025

Which issue does this PR close?

Rationale for this change

Commit 0f4b8b1 (#13953) introduced a new
optimized evaluation mode for CASE expression with exactly one WHEN-THEN
clause and ELSE clause being present. Apparently, the new mode did not
take into account expensive or fallible expressions, as it
unconditionally evaluated both branches. This is definitely incorrect
for fallible expressions. For these, fallback to the original execution
mode.

What changes are included in this PR?

fix so that CASE is lazy again

Are these changes tested?

slt

Are there any user-facing changes?

yes

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Mar 24, 2025
@findepi findepi requested review from alamb and jonahgao March 24, 2025 12:09
@findepi
Copy link
Member Author

findepi commented Mar 24, 2025

Alternatively, we could change Literal::evaluate_selection to return null scalar when selection is empty.
However, for this to be effective, it would require adding evaluate_selection implementations to any physical expressions that may contain literals. Then, using BooleanArray for selection would no longer be efficient, as expressions would need to scan it for true values, and deal with nullability. A new type for selection would likely be more appropriate. Anyway, this sounds not like a bug fix we need for the regression.

findepi added 2 commits March 24, 2025 14:08
Commit 0f4b8b1 introduced a new
optimized evaluation mode for CASE expression with exactly one WHEN-THEN
clause and ELSE clause being present. Apparently, the new mode did not
take into account expensive or fallible expressions, as it
unconditionally evaluated both branches. This is definitely incorrect
for fallible expressions. For these, fallback to the original execution
mode.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @findepi

Given the code was generating incorrect answers but is now correct this looks good to me. I am running some benchmarks to see if there are any performance impacts

cc @aweltsch @2010YOUY01 who authored/helped with #13953

1 10
2 5

query II
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this errors on main:

> SELECT v, CASE WHEN v != 0 THEN 10/v ELSE 42 END FROM (VALUES (0), (1), (2)) t(v);
+---+-------------------------------------------------------------------+
| v | CASE WHEN t.v != Int64(0) THEN Int64(10) / t.v ELSE Int64(42) END |
+---+-------------------------------------------------------------------+
| 0 | 42                                                                |
| 1 | 10                                                                |
| 2 | 5                                                                 |
+---+-------------------------------------------------------------------+
3 row(s) fetched.
Elapsed 0.002 seconds.

> SELECT v, CASE WHEN v < 0 THEN 10/0 ELSE 1 END FROM (VALUES (1), (2)) t(v);
Arrow error: Divide by zero error

@alamb
Copy link
Contributor

alamb commented Mar 24, 2025

Here are my benchmark results. My conclusion is no discernable difference

group                          findepi_lazy-case                      main
-----                          -----------------                      ----
case_when: CASE expr           1.02     25.1±0.24µs        ? ?/sec    1.00     24.6±0.27µs        ? ?/sec
case_when: column or null      1.00   1513.6±2.56ns        ? ?/sec    1.00  1518.0±34.78ns        ? ?/sec
case_when: expr or expr        1.01     33.8±0.28µs        ? ?/sec    1.00     33.4±0.16µs        ? ?/sec
case_when: scalar or scalar    1.00      8.3±0.02µs        ? ?/sec    1.00      8.3±0.09µs        ? ?/sec

@findepi findepi merged commit 424cf5a into apache:main Mar 25, 2025
27 checks passed
@findepi findepi deleted the findepi/lazy-case branch March 25, 2025 09:24
qstommyshu pushed a commit to qstommyshu/datafusion that referenced this pull request Mar 27, 2025
* Restore lazy evaluation of fallible CASE

Commit 0f4b8b1 introduced a new
optimized evaluation mode for CASE expression with exactly one WHEN-THEN
clause and ELSE clause being present. Apparently, the new mode did not
take into account expensive or fallible expressions, as it
unconditionally evaluated both branches. This is definitely incorrect
for fallible expressions. For these, fallback to the original execution
mode.

* Workaround for running sqllogictests in RustRover
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* Restore lazy evaluation of fallible CASE

Commit 0f4b8b1 introduced a new
optimized evaluation mode for CASE expression with exactly one WHEN-THEN
clause and ELSE clause being present. Apparently, the new mode did not
take into account expensive or fallible expressions, as it
unconditionally evaluated both branches. This is definitely incorrect
for fallible expressions. For these, fallback to the original execution
mode.

* Workaround for running sqllogictests in RustRover
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: eager evaluation of expressions inside CASE conditional expression

2 participants