-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement short circuit function evaluation #23367
Conversation
809858c
to
4c63a51
Compare
aaa9877
to
1549d3e
Compare
case ActionsDAG::ActionType::ALIAS: | ||
rewriteShortCircuitArguments(child->children, need_outside, force_rewrite); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose arrayJoin
also should be supported.
E.g.: select if(number != 0, arrayJoin([1, intDiv(42, number), 2]), 0)
Here, arrayJoin for short-circuit column will work as if array is empty for mask = 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose arrayJoin also should be supported
I spent a lot of time thinking about how to support it for arrayJoin, but I considered that it's almost impossible. I wrote a comment about it:
https://github.com/ClickHouse/ClickHouse/blob/591cecc66499e99948001a82b0f03e1ced4c7763/src/Interpreters/ExpressionActions.cpp#L81-L87
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think it is ok to get offsets after filtering.
select if (number % 2 = 0, [], arrayJoin(range(number))) from numbers(4);
Ordinary evaluation
[0, 1, 2, 3] -> range -> [[], [0], [0, 1], [0, 1, 2]] -> arrayJoin -> [0, 0, 1, 0, 1, 2] -> if -> [0, 0, 1, 2]
[1, 0, 1, 0] -> arrayJoin -> [0, 1, 1, 0, 0, 0]
short-circuit evaluation
if -> [?, 1, ?, 3] -> range -> [?, [0], ?, [0, 1, 2]] -> arrayJoin -> [0, 0, 1, 2]
if -> [1, 0, 1, 0]
But we can support it later (if needed)
@Mergifyio update |
Command
|
:) select if(number > 0, intDiv(42, number), 0) from numbers(4)
SELECT if(number > 0, intDiv(42, number), 0)
FROM numbers(4)
Query id: 9d6e83c0-b3e9-4a71-b03a-d03307aafe8d
┌─if(greater(number, 0), intDiv(42, number), 0)─┐
│ 0 │
│ 42 │
│ 21 │
│ 14 │
└───────────────────────────────────────────────┘
4 rows in set. Elapsed: 0.003 sec.
:) select if(number = 0, 0, intDiv(42, number)) from numbers(4)
SELECT if(number = 0, 0, intDiv(42, number))
FROM numbers(4)
Query id: 27a0f2c2-a395-4346-a332-2a41b666072d
┌─if(equals(number, 0), 0, intDiv(42, number))─┐
│ 0 │
│ 42 │
│ 21 │
│ 14 │
└──────────────────────────────────────────────┘
4 rows in set. Elapsed: 0.002 sec.
:) select if(number > 0, intDiv(42, number), 0), if(number = 0, 0, intDiv(42, number)) from numbers(4)
SELECT
if(number > 0, intDiv(42, number), 0),
if(number = 0, 0, intDiv(42, number))
FROM numbers(4)
Query id: 2219d4ed-5dcc-4906-9b37-0b0c6093802f
0 rows in set. Elapsed: 0.004 sec.
Received exception from server (version 21.7.1):
Code: 153. DB::Exception: Received from localhost:9000. DB::Exception: Division by zero: while executing 'FUNCTION intDiv(42 :: 2, number :: 0) -> intDiv(42, number) UInt8 : 5'.
|
:) select if(if(number > 0, intDiv(42, number), 7), intDiv(42, number), 8) from numbers(4) format TSV
SELECT if(if(number > 0, intDiv(42, number), 7), intDiv(42, number), 8)
FROM numbers(4)
FORMAT TSV
Query id: d55177ae-ccb0-4ba9-9194-66781b257493
0 rows in set. Elapsed: 0.004 sec.
Received exception from server (version 21.7.1):
Code: 153. DB::Exception: Received from localhost:9000. DB::Exception: Division by zero: while executing 'FUNCTION intDiv(42 :: 2, number :: 0) -> intDiv(42, number) UInt8 : 1'.
:) select if(number > 0, intDiv(42, number), 7) from numbers(4) format TSV
SELECT if(number > 0, intDiv(42, number), 7)
FROM numbers(4)
FORMAT TSV
Query id: 0d7e3fb5-7f18-4297-a1e1-01e66dfbdc3a
7
42
21
14
4 rows in set. Elapsed: 0.003 sec.
:) select if(number > 0, intDiv(42, number), 8) from numbers(4) format TSV
SELECT if(number > 0, intDiv(42, number), 8)
FROM numbers(4)
FORMAT TSV
Query id: bae9fdd8-97b5-4d9a-b31b-429b0936f04d
8
42
21
14
4 rows in set. Elapsed: 0.003 sec. |
I checked everything except I see that some binary |
src/Columns/MaskOperations.cpp
Outdated
if constexpr (column_is_short) | ||
{ | ||
if (data_index >= data.size()) | ||
throw Exception("Amount of ones in the mask doesn't equal short column size", ErrorCodes::LOGICAL_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe exception from the main loop may prevent vectorisation ...
But I am ok with such implementation for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to merge it
But also let's wait for perftest. |
@Mergifyio update |
Command
|
Internal documentation ticket: DOCSUP-13429 |
I found it interesting that even with short circuit evaluation disabled, there is still huge overhead: SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1, short_circuit_function_evaluation='disable'
-- 11.8sec
SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1
-- 11.88sec
SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1
-- 8.2sec on 21.7 w/o https://github.com/ClickHouse/ClickHouse/pull/23367 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implementation of short circuit function evaluation, closes #12587.
Add settings
short_circuit_function_evaluation
to configure short circuit function evaluation.Detailed description / Documentation draft:
Short circuit function evaluation
Now for functions
if, multiIf, and, or
values of their arguments (except the first one) are executed only if they are needed in the result.1)
if(cond, expr_1, expr_2)
expr_1
will be evaluated only on rows wherecond
is true,expr_2
- wherecondition
is false.Example:
In this example we won't get an exception about division by zero because
intDiv(42, number)
will be evaluated only for numbers that doesn't satisfy conditionnumber = 0
.2)
multiIf(cond_1, expr_1, cond_2, expr_2, ..., cond_{n-1}, expr_{n-1}, )
expr_i
will be evaluated only on rows where(not cond_1) and (NOT cond_2) and ... and (not cond_{i-1}) and cond_i
is true.cond_i
will be evaluated only on rows where(not cond_1) and (not cond_2) and ... and (not cond_{i-1})
is true.Example:
In this example we won't get an exception about division by zero.
3)
expr_1 and expr_2 and ... and expr_n
expr_i
will be evaluated only on rows whereexpr_1 and expr_2 and ... and expr_{i-1}
is true.Example:
In this example we won't get an exception about division by zero.
4)
expr_1 or expr_2 or ... or exp_n
expr_i
will be evaluated only on rows where(not expr_1) and (not expr_2) and ... and (not expr_{i-1})
is true.Example:
In this example we won't get an exception about division by zero.
User setting
There is a new setting for users:
short_circuit_function_evaluation
. Possible values:enable
Enables short-circuit function evaluation for functions that are suitable for it (can throw or computationally heavy).force_enable
Enables short-circuit function evaluation for all functions.disable
Disables short-circuit function evaluation.